[gcc-6] 229/401: * Update the Linaro support to the 6-2016.10 snapshot.
Ximin Luo
infinity0 at debian.org
Wed Apr 5 15:49:41 UTC 2017
This is an automated email from the git hooks/post-receive script.
infinity0 pushed a commit to branch pu/reproducible_builds
in repository gcc-6.
commit ff84db0058027365947defd8892952d2eadff65f
Author: doko <doko at 6ca36cf4-e1d1-0310-8c6f-e303bb2178ca>
Date: Tue Oct 18 08:05:08 2016 +0000
* Update the Linaro support to the 6-2016.10 snapshot.
git-svn-id: svn://anonscm.debian.org/gcccvs/branches/sid/gcc-6@9001 6ca36cf4-e1d1-0310-8c6f-e303bb2178ca
---
debian/changelog | 1 +
debian/patches/gcc-linaro-doc.diff | 311 +-
debian/patches/gcc-linaro-no-macros.diff | 4 +-
debian/patches/gcc-linaro.diff | 132172 ++++++++++++++++++++++++++--
4 files changed, 125659 insertions(+), 6829 deletions(-)
diff --git a/debian/changelog b/debian/changelog
index f2aecd0..f19fa6d 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -11,6 +11,7 @@ gcc-6 (6.2.0-7) UNRELEASED; urgency=medium
* Configure with --enable-default-pie and pass -z now when pie is enabled;
on amd64 arm64 armel armhf i386 mips mipsel mips64el ppc64el s390x.
Closes: #835148.
+ * Update the Linaro support to the 6-2016.10 snapshot.
[ Aurelien Jarno ]
* Enable logwatch on mips64el.
diff --git a/debian/patches/gcc-linaro-doc.diff b/debian/patches/gcc-linaro-doc.diff
index dd2b39c..f722c32 100644
--- a/debian/patches/gcc-linaro-doc.diff
+++ b/debian/patches/gcc-linaro-doc.diff
@@ -1,4 +1,4 @@
-# DP: Changes for the Linaro 6-2016.08 release (documentation).
+# DP: Changes for the Linaro 6-2016.10 release (documentation).
--- a/src/gcc/doc/cpp.texi
+++ b/src/gcc/doc/cpp.texi
@@ -11,6 +11,34 @@
minor version and patch level are reset. If you wish to use the
predefined macros directly in the conditional, you will need to write it
like this:
+--- a/src/gcc/doc/fragments.texi
++++ b/src/gcc/doc/fragments.texi
+@@ -156,15 +156,16 @@ variants. And for some targets it is better to reuse an existing multilib
+ than to fall back to default multilib when there is no corresponding multilib.
+ This can be done by adding reuse rules to @code{MULTILIB_REUSE}.
+
+-A reuse rule is comprised of two parts connected by equality sign. The left part
+-is option set used to build multilib and the right part is option set that will
+-reuse this multilib. The order of options in the left part matters and should be
+-same with those specified in @code{MULTILIB_REQUIRED} or aligned with order in
+- at code{MULTILIB_OPTIONS}. There is no such limitation for options in right part
+-as we don't build multilib from them. But the equality sign in both parts should
+-be replaced with period.
+-
+-The @code{MULTILIB_REUSE} is different from @code{MULTILIB_MATCHES} in that it
++A reuse rule is comprised of two parts connected by equality sign. The left
++part is the option set used to build multilib and the right part is the option
++set that will reuse this multilib. Both parts should only use options
++specified in @code{MULTILIB_OPTIONS} and the equality signs found in options
++name should be replaced with periods. The order of options in the left part
++matters and should be same with those specified in @code{MULTILIB_REQUIRED} or
++aligned with the order in @code{MULTILIB_OPTIONS}. There is no such limitation
++for options in the right part as we don't build multilib from them.
++
++ at code{MULTILIB_REUSE} is different from @code{MULTILIB_MATCHES} in that it
+ sets up relations between two option sets rather than two options. Here is an
+ example to demo how we reuse libraries built in Thumb mode for applications built
+ in ARM mode:
--- a/src/gcc/doc/invoke.texi
+++ b/src/gcc/doc/invoke.texi
@@ -573,6 +573,8 @@ Objective-C and Objective-C++ Dialects}.
@@ -130,18 +158,38 @@
@item -march=@var{name}
@opindex march
-@@ -12957,17 +12985,15 @@ Specify the name of the target processor for which GCC should tune the
+@@ -12929,10 +12957,13 @@ more feature modifiers. This option has the form
+ @option{-march=@var{arch}@r{@{}+ at r{[}no at r{]}@var{feature}@r{@}*}}.
+
+ The permissible values for @var{arch} are @samp{armv8-a},
+- at samp{armv8.1-a} or @var{native}.
++ at samp{armv8.1-a}, @samp{armv8.2-a} or @var{native}.
++
++The value @samp{armv8.2-a} implies @samp{armv8.1-a} and enables compiler
++support for the ARMv8.2-A architecture extensions.
+
+ The value @samp{armv8.1-a} implies @samp{armv8-a} and enables compiler
+-support for the ARMv8.1 architecture extension. In particular, it
++support for the ARMv8.1-A architecture extension. In particular, it
+ enables the @samp{+crc} and @samp{+lse} features.
+
+ The value @samp{native} is available on native AArch64 GNU/Linux and
+@@ -12956,18 +12987,18 @@ processors implementing the target architecture.
+ Specify the name of the target processor for which GCC should tune the
performance of the code. Permissible values for this option are:
@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57},
- @samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx},
+- at samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx},
- at samp{xgene1}.
-+ at samp{xgene1}, @samp{vulcan}, @samp{cortex-a57.cortex-a53},
-+ at samp{cortex-a72.cortex-a53}, @samp{native}.
++ at samp{cortex-a72}, @samp{cortex-a73}, @samp{exynos-m1}, @samp{qdf24xx},
++ at samp{thunderx}, @samp{xgene1}, @samp{vulcan}, @samp{cortex-a57.cortex-a53},
++ at samp{cortex-a72.cortex-a53}, @samp{cortex-a73.cortex-a35},
++ at samp{cortex-a73.cortex-a53}, @samp{native}.
-Additionally, this option can specify that GCC should tune the performance
-of the code for a big.LITTLE system. Permissible values for this
-option are: @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}.
-+The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}
++The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
++ at samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53}
+specify that GCC should tune for a big.LITTLE system.
Additionally on native AArch64 GNU/Linux systems the value
@@ -154,7 +202,7 @@
Where none of @option{-mtune=}, @option{-mcpu=} or @option{-march=}
are specified, the code is tuned to perform well across a range
-@@ -12987,12 +13013,6 @@ documented in the sub-section on
+@@ -12987,12 +13018,6 @@ documented in the sub-section on
Feature Modifiers}. Where conflicting feature modifiers are
specified, the right-most feature is used.
@@ -167,7 +215,7 @@
GCC uses @var{name} to determine what kind of instructions it can emit when
generating assembly code (as if by @option{-march}) and to determine
the target processor for which to tune for performance (as if
-@@ -13010,11 +13030,11 @@ across releases.
+@@ -13010,11 +13035,11 @@ across releases.
This option is only intended to be useful when developing GCC.
@item -mpc-relative-literal-loads
@@ -184,7 +232,12 @@
@end table
-@@ -13045,9 +13065,9 @@ Enable Large System Extension instructions. This is on by default for
+@@ -13042,12 +13067,14 @@ instructions. This is on by default for all possible values for options
+ @item lse
+ Enable Large System Extension instructions. This is on by default for
+ @option{-march=armv8.1-a}.
++ at item fp16
++Enable FP16 extension. This also enables floating-point instructions.
@end table
@@ -197,7 +250,107 @@
@node Adapteva Epiphany Options
@subsection Adapteva Epiphany Options
-@@ -18082,7 +18102,7 @@ IEEE 754 floating-point data.
+@@ -13967,21 +13994,42 @@ name to determine what kind of instructions it can emit when generating
+ assembly code. This option can be used in conjunction with or instead
+ of the @option{-mcpu=} option. Permissible names are: @samp{armv2},
+ @samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t},
+- at samp{armv5}, @samp{armv5t}, @samp{armv5e}, @samp{armv5te},
+- at samp{armv6}, @samp{armv6j},
+- at samp{armv6t2}, @samp{armv6z}, @samp{armv6kz}, @samp{armv6-m},
+- at samp{armv7}, @samp{armv7-a}, @samp{armv7-r}, @samp{armv7-m}, @samp{armv7e-m},
++ at samp{armv5}, @samp{armv5e}, @samp{armv5t}, @samp{armv5te},
++ at samp{armv6}, @samp{armv6-m}, @samp{armv6j}, @samp{armv6k},
++ at samp{armv6kz}, @samp{armv6s-m},
++ at samp{armv6t2}, @samp{armv6z}, @samp{armv6zk},
++ at samp{armv7}, @samp{armv7-a}, @samp{armv7-m}, @samp{armv7-r}, @samp{armv7e-m},
+ @samp{armv7ve}, @samp{armv8-a}, @samp{armv8-a+crc}, @samp{armv8.1-a},
+- at samp{armv8.1-a+crc}, @samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}.
++ at samp{armv8.1-a+crc}, @samp{armv8-m.base}, @samp{armv8-m.main},
++ at samp{armv8-m.main+dsp}, @samp{iwmmxt}, @samp{iwmmxt2}.
++
++Architecture revisions older than @samp{armv4t} are deprecated.
+
+-Architecture revisions older than @option{armv4t} are deprecated.
++ at option{-march=armv6s-m} is the @samp{armv6-m} architecture with support for
++the (now mandatory) SVC instruction.
+
+- at option{-march=armv7ve} is the armv7-a architecture with virtualization
++ at option{-march=armv6zk} is an alias for @samp{armv6kz}, existing for backwards
++compatibility.
++
++ at option{-march=armv7ve} is the @samp{armv7-a} architecture with virtualization
+ extensions.
+
+ @option{-march=armv8-a+crc} enables code generation for the ARMv8-A
+ architecture together with the optional CRC32 extensions.
+
++ at option{-march=armv8.1-a} enables compiler support for the ARMv8.1-A
++architecture. This also enables the features provided by
++ at option{-march=armv8-a+crc}.
++
++ at option{-march=armv8.2-a} enables compiler support for the ARMv8.2-A
++architecture. This also enables the features provided by
++ at option{-march=armv8.1-a}.
++
++ at option{-march=armv8.2-a+fp16} enables compiler support for the
++ARMv8.2-A architecture with the optional FP16 instructions extension.
++This also enables the features provided by @option{-march=armv8.1-a}
++and implies @option{-mfp16-format=ieee}.
++
+ @option{-march=native} causes the compiler to auto-detect the architecture
+ of the build computer. At present, this feature is only supported on
+ GNU/Linux, and not all architectures are recognized. If the auto-detect
+@@ -14013,7 +14061,7 @@ Permissible names are: @samp{arm2}, @samp{arm250},
+ @samp{generic-armv7-a}, @samp{cortex-a5}, @samp{cortex-a7}, @samp{cortex-a8},
+ @samp{cortex-a9}, @samp{cortex-a12}, @samp{cortex-a15}, @samp{cortex-a17},
+ @samp{cortex-a32}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57},
+- at samp{cortex-a72}, @samp{cortex-r4},
++ at samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-r4},
+ @samp{cortex-r4f}, @samp{cortex-r5}, @samp{cortex-r7}, @samp{cortex-r8},
+ @samp{cortex-m7},
+ @samp{cortex-m4},
+@@ -14035,7 +14083,8 @@ Permissible names are: @samp{arm2}, @samp{arm250},
+ Additionally, this option can specify that GCC should tune the performance
+ of the code for a big.LITTLE system. Permissible names are:
+ @samp{cortex-a15.cortex-a7}, @samp{cortex-a17.cortex-a7},
+- at samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}.
++ at samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
++ at samp{cortex-a72.cortex-a35}, @samp{cortex-a73.cortex-a53}.
+
+ @option{-mtune=generic- at var{arch}} specifies that GCC should tune the
+ performance for a blend of processors within architecture @var{arch}.
+@@ -14165,9 +14214,12 @@ otherwise the default is @samp{R10}.
+
+ @item -mpic-data-is-text-relative
+ @opindex mpic-data-is-text-relative
+-Assume that each data segments are relative to text segment at load time.
+-Therefore, it permits addressing data using PC-relative operations.
+-This option is on by default for targets other than VxWorks RTP.
++Assume that the displacement between the text and data segments is fixed
++at static link time. This permits using PC-relative addressing
++operations to access data known to be in the data segment. For
++non-VxWorks RTP targets, this option is enabled by default. When
++disabled on such targets, it will enable @option{-msingle-pic-base} by
++default.
+
+ @item -mpoke-function-name
+ @opindex mpoke-function-name
+@@ -14277,10 +14329,10 @@ generating these instructions. This option is enabled by default when
+ @opindex mno-unaligned-access
+ Enables (or disables) reading and writing of 16- and 32- bit values
+ from addresses that are not 16- or 32- bit aligned. By default
+-unaligned access is disabled for all pre-ARMv6 and all ARMv6-M
+-architectures, and enabled for all other architectures. If unaligned
+-access is not enabled then words in packed data structures are
+-accessed a byte at a time.
++unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for
++ARMv8-M Baseline architectures, and enabled for all other
++architectures. If unaligned access is not enabled then words in packed
++data structures are accessed a byte at a time.
+
+ The ARM attribute @code{Tag_CPU_unaligned_access} is set in the
+ generated object file to either true or false, depending upon the
+@@ -18082,7 +18134,7 @@ IEEE 754 floating-point data.
The @option{-mnan=legacy} option selects the legacy encoding. In this
case quiet NaNs (qNaNs) are denoted by the first bit of their trailing
@@ -217,3 +370,141 @@
raised and a quiet @code{NaN} is returned.
All operands have mode @var{m}, which is a scalar or vector
+--- a/src/gcc/doc/sourcebuild.texi
++++ b/src/gcc/doc/sourcebuild.texi
+@@ -1555,6 +1555,16 @@ options. Some multilibs may be incompatible with these options.
+ ARM Target supports @code{-mfpu=neon-vfpv4 -mfloat-abi=softfp} or compatible
+ options. Some multilibs may be incompatible with these options.
+
++ at item arm_fp16_ok
++ at anchor{arm_fp16_ok}
++Target supports options to generate VFP half-precision floating-point
++instructions. Some multilibs may be incompatible with these
++options. This test is valid for ARM only.
++
++ at item arm_fp16_hw
++Target supports executing VFP half-precision floating-point
++instructions. This test is valid for ARM only.
++
+ @item arm_neon_fp16_ok
+ @anchor{arm_neon_fp16_ok}
+ ARM Target supports @code{-mfpu=neon-fp16 -mfloat-abi=softfp} or compatible
+@@ -1565,6 +1575,13 @@ options, including @code{-mfp16-format=ieee} if necessary to obtain the
+ Test system supports executing Neon half-precision float instructions.
+ (Implies previous.)
+
++ at item arm_fp16_alternative_ok
++ARM target supports the ARM FP16 alternative format. Some multilibs
++may be incompatible with the options needed.
++
++ at item arm_fp16_none_ok
++ARM target supports specifying none as the ARM FP16 format.
++
+ @item arm_thumb1_ok
+ ARM target generates Thumb-1 code for @code{-mthumb}.
+
+@@ -1589,6 +1606,7 @@ ARM target supports @code{-mfpu=neon-fp-armv8 -mfloat-abi=softfp}.
+ Some multilibs may be incompatible with these options.
+
+ @item arm_v8_1a_neon_ok
++ at anchor{arm_v8_1a_neon_ok}
+ ARM target supports options to generate ARMv8.1 Adv.SIMD instructions.
+ Some multilibs may be incompatible with these options.
+
+@@ -1597,10 +1615,43 @@ ARM target supports executing ARMv8.1 Adv.SIMD instructions. Some
+ multilibs may be incompatible with the options needed. Implies
+ arm_v8_1a_neon_ok.
+
++ at item arm_acq_rel
++ARM target supports acquire-release instructions.
++
++ at item arm_v8_2a_fp16_scalar_ok
++ at anchor{arm_v8_2a_fp16_scalar_ok}
++ARM target supports options to generate instructions for ARMv8.2 and
++scalar instructions from the FP16 extension. Some multilibs may be
++incompatible with these options.
++
++ at item arm_v8_2a_fp16_scalar_hw
++ARM target supports executing instructions for ARMv8.2 and scalar
++instructions from the FP16 extension. Some multilibs may be
++incompatible with these options. Implies arm_v8_2a_fp16_neon_ok.
++
++ at item arm_v8_2a_fp16_neon_ok
++ at anchor{arm_v8_2a_fp16_neon_ok}
++ARM target supports options to generate instructions from ARMv8.2 with
++the FP16 extension. Some multilibs may be incompatible with these
++options. Implies arm_v8_2a_fp16_scalar_ok.
++
++ at item arm_v8_2a_fp16_neon_hw
++ARM target supports executing instructions from ARMv8.2 with the FP16
++extension. Some multilibs may be incompatible with these options.
++Implies arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_scalar_hw.
++
+ @item arm_prefer_ldrd_strd
+ ARM target prefers @code{LDRD} and @code{STRD} instructions over
+ @code{LDM} and @code{STM} instructions.
+
++ at item arm_thumb1_movt_ok
++ARM target generates Thumb-1 code for @code{-mthumb} with @code{MOVW}
++and @code{MOVT} instructions available.
++
++ at item arm_thumb1_cbz_ok
++ARM target generates Thumb-1 code for @code{-mthumb} with
++ at code{CBZ} and @code{CBNZ} instructions available.
++
+ @end table
+
+ @subsubsection AArch64-specific attributes
+@@ -2066,6 +2117,28 @@ NEON support. Only ARM targets support this feature, and only then
+ in certain modes; see the @ref{arm_neon_ok,,arm_neon_ok effective target
+ keyword}.
+
++ at item arm_fp16
++VFP half-precision floating point support. This does not select the
++FP16 format; for that, use @ref{arm_fp16_ieee,,arm_fp16_ieee} or
++ at ref{arm_fp16_alternative,,arm_fp16_alternative} instead. This
++feature is only supported by ARM targets and then only in certain
++modes; see the @ref{arm_fp16_ok,,arm_fp16_ok effective target
++keyword}.
++
++ at item arm_fp16_ieee
++ at anchor{arm_fp16_ieee}
++ARM IEEE 754-2008 format VFP half-precision floating point support.
++This feature is only supported by ARM targets and then only in certain
++modes; see the @ref{arm_fp16_ok,,arm_fp16_ok effective target
++keyword}.
++
++ at item arm_fp16_alternative
++ at anchor{arm_fp16_alternative}
++ARM Alternative format VFP half-precision floating point support.
++This feature is only supported by ARM targets and then only in certain
++modes; see the @ref{arm_fp16_ok,,arm_fp16_ok effective target
++keyword}.
++
+ @item arm_neon_fp16
+ NEON and half-precision floating point support. Only ARM targets
+ support this feature, and only then in certain modes; see
+@@ -2075,6 +2148,23 @@ the @ref{arm_neon_fp16_ok,,arm_neon_fp16_ok effective target keyword}.
+ arm vfp3 floating point support; see
+ the @ref{arm_vfp3_ok,,arm_vfp3_ok effective target keyword}.
+
++ at item arm_v8_1a_neon
++Add options for ARMv8.1 with Adv.SIMD support, if this is supported
++by the target; see the @ref{arm_v8_1a_neon_ok,,arm_v8_1a_neon_ok}
++effective target keyword.
++
++ at item arm_v8_2a_fp16_scalar
++Add options for ARMv8.2 with scalar FP16 support, if this is
++supported by the target; see the
++ at ref{arm_v8_2a_fp16_scalar_ok,,arm_v8_2a_fp16_scalar_ok} effective
++target keyword.
++
++ at item arm_v8_2a_fp16_neon
++Add options for ARMv8.2 with Adv.SIMD FP16 support, if this is
++supported by the target; see the
++ at ref{arm_v8_2a_fp16_neon_ok,,arm_v8_2a_fp16_neon_ok} effective target
++keyword.
++
+ @item bind_pic_locally
+ Add the target-specific flags needed to enable functions to bind
+ locally when using pic/PIC passes in the testsuite.
diff --git a/debian/patches/gcc-linaro-no-macros.diff b/debian/patches/gcc-linaro-no-macros.diff
index 9da5f40..6d5a29e 100644
--- a/debian/patches/gcc-linaro-no-macros.diff
+++ b/debian/patches/gcc-linaro-no-macros.diff
@@ -88,8 +88,8 @@ Index: b/src/gcc/LINARO-VERSION
===================================================================
--- a/src/gcc/LINARO-VERSION
+++ /dev/null
-@@ -1 +0,0 @@
--6.1-2016.08~dev
+@@ -1,1 +0,0 @@
+-Snapshot 6.2-2016.10
Index: b/src/gcc/configure.ac
===================================================================
--- a/src/gcc/configure.ac
diff --git a/debian/patches/gcc-linaro.diff b/debian/patches/gcc-linaro.diff
index 3494a03..01cae87 100644
--- a/debian/patches/gcc-linaro.diff
+++ b/debian/patches/gcc-linaro.diff
@@ -1,8 +1,8 @@
-# DP: Changes for the Linaro 6-2016.08 release.
+# DP: Changes for the Linaro 6-2016.10 release.
MSG=$(git log origin/linaro/gcc-6-branch --format=format:"%s" -n 1 --grep "Merge branches"); SVN=${MSG##* }; git log origin/gcc-6-branch --format=format:"%H" -n 1 --grep "gcc-6-branch@${SVN%.}"
-LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac410a5588dd1faf3462a \
+LANG=C git diff 70232cbbcab57eecc73626f3ea0e13bdfa00202d..bc32472ee917a01b63e72dc399c81d26259e78aa \
| egrep -v '^(diff|index) ' \
| filterdiff --strip=1 --addoldprefix=a/src/ --addnewprefix=b/src/ \
| sed 's,a/src//dev/null,/dev/null,'
@@ -10,7 +10,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
--- /dev/null
+++ b/src/gcc/LINARO-VERSION
@@ -0,0 +1 @@
-+6.1-2016.08~dev
++Snapshot 6.2-2016.10
--- a/src/gcc/Makefile.in
+++ b/src/gcc/Makefile.in
@@ -832,10 +832,12 @@ BASEVER := $(srcdir)/BASE-VER # 4.x.y
@@ -46,9 +46,81 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
CFLAGS-cppdefault.o += $(PREPROCESSOR_DEFINES)
+--- a/src/gcc/calls.c
++++ b/src/gcc/calls.c
+@@ -194,10 +194,19 @@ prepare_call_address (tree fndecl_or_type, rtx funexp, rtx static_chain_value,
+ && targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
+ ? force_not_mem (memory_address (FUNCTION_MODE, funexp))
+ : memory_address (FUNCTION_MODE, funexp));
+- else if (! sibcallp)
++ else
+ {
+- if (!NO_FUNCTION_CSE && optimize && ! flag_no_function_cse)
+- funexp = force_reg (Pmode, funexp);
++ /* funexp could be a SYMBOL_REF represents a function pointer which is
++ of ptr_mode. In this case, it should be converted into address mode
++ to be a valid address for memory rtx pattern. See PR 64971. */
++ if (GET_MODE (funexp) != Pmode)
++ funexp = convert_memory_address (Pmode, funexp);
++
++ if (! sibcallp)
++ {
++ if (!NO_FUNCTION_CSE && optimize && ! flag_no_function_cse)
++ funexp = force_reg (Pmode, funexp);
++ }
+ }
+
+ if (static_chain_value != 0
+--- a/src/gcc/cfg.c
++++ b/src/gcc/cfg.c
+@@ -1064,7 +1064,7 @@ free_original_copy_tables (void)
+ delete bb_copy;
+ bb_copy = NULL;
+ delete bb_original;
+- bb_copy = NULL;
++ bb_original = NULL;
+ delete loop_copy;
+ loop_copy = NULL;
+ delete original_copy_bb_pool;
--- a/src/gcc/config.gcc
+++ b/src/gcc/config.gcc
-@@ -3795,38 +3795,40 @@ case "${target}" in
+@@ -307,7 +307,7 @@ m32c*-*-*)
+ ;;
+ aarch64*-*-*)
+ cpu_type=aarch64
+- extra_headers="arm_neon.h arm_acle.h"
++ extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
+ c_target_objs="aarch64-c.o"
+ cxx_target_objs="aarch64-c.o"
+ extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
+@@ -327,7 +327,7 @@ arc*-*-*)
+ arm*-*-*)
+ cpu_type=arm
+ extra_objs="arm-builtins.o aarch-common.o"
+- extra_headers="mmintrin.h arm_neon.h arm_acle.h"
++ extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h"
+ target_type_format_char='%'
+ c_target_objs="arm-c.o"
+ cxx_target_objs="arm-c.o"
+@@ -1495,7 +1495,7 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | i
+ extra_options="${extra_options} linux-android.opt"
+ # Assume modern glibc if not targeting Android nor uclibc.
+ case ${target} in
+- *-*-*android*|*-*-*uclibc*)
++ *-*-*android*|*-*-*uclibc*|*-*-*musl*)
+ ;;
+ *)
+ default_gnu_indirect_function=yes
+@@ -1564,7 +1564,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu | x86_64-*-knetbsd*-gnu)
+ extra_options="${extra_options} linux-android.opt"
+ # Assume modern glibc if not targeting Android nor uclibc.
+ case ${target} in
+- *-*-*android*|*-*-*uclibc*)
++ *-*-*android*|*-*-*uclibc*|*-*-*musl*)
+ ;;
+ *)
+ default_gnu_indirect_function=yes
+@@ -3806,38 +3806,40 @@ case "${target}" in
# Add extra multilibs
if test "x$with_multilib_list" != x; then
arm_multilibs=`echo $with_multilib_list | sed -e 's/,/ /g'`
@@ -114,9 +186,36 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
fi
;;
+--- a/src/gcc/config/aarch64/aarch64-arches.def
++++ b/src/gcc/config/aarch64/aarch64-arches.def
+@@ -32,4 +32,5 @@
+
+ AARCH64_ARCH("armv8-a", generic, 8A, 8, AARCH64_FL_FOR_ARCH8)
+ AARCH64_ARCH("armv8.1-a", generic, 8_1A, 8, AARCH64_FL_FOR_ARCH8_1)
++AARCH64_ARCH("armv8.2-a", generic, 8_2A, 8, AARCH64_FL_FOR_ARCH8_2)
+
--- a/src/gcc/config/aarch64/aarch64-builtins.c
+++ b/src/gcc/config/aarch64/aarch64-builtins.c
-@@ -173,6 +173,10 @@ aarch64_types_shift_to_unsigned_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+@@ -62,6 +62,7 @@
+ #define si_UP SImode
+ #define sf_UP SFmode
+ #define hi_UP HImode
++#define hf_UP HFmode
+ #define qi_UP QImode
+ #define UP(X) X##_UP
+
+@@ -139,6 +140,10 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_none, qualifier_none, qualifier_unsigned };
+ #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers)
+ static enum aarch64_type_qualifiers
++aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
++ = { qualifier_unsigned, qualifier_none, qualifier_none };
++#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers)
++static enum aarch64_type_qualifiers
+ aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_poly, qualifier_poly, qualifier_poly };
+ #define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
+@@ -173,6 +178,10 @@ aarch64_types_shift_to_unsigned_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_unsigned, qualifier_none, qualifier_immediate };
#define TYPES_SHIFTIMM_USS (aarch64_types_shift_to_unsigned_qualifiers)
static enum aarch64_type_qualifiers
@@ -127,9 +226,30 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
aarch64_types_unsigned_shift_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_unsigned, qualifier_unsigned, qualifier_immediate };
#define TYPES_USHIFTIMM (aarch64_types_unsigned_shift_qualifiers)
+--- a/src/gcc/config/aarch64/aarch64-c.c
++++ b/src/gcc/config/aarch64/aarch64-c.c
+@@ -95,6 +95,11 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
+ else
+ cpp_undef (pfile, "__ARM_FP");
+
++ aarch64_def_or_undef (TARGET_FP_F16INST,
++ "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", pfile);
++ aarch64_def_or_undef (TARGET_SIMD_F16INST,
++ "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", pfile);
++
+ aarch64_def_or_undef (TARGET_SIMD, "__ARM_FEATURE_NUMERIC_MAXMIN", pfile);
+ aarch64_def_or_undef (TARGET_SIMD, "__ARM_NEON", pfile);
+
--- a/src/gcc/config/aarch64/aarch64-cores.def
+++ b/src/gcc/config/aarch64/aarch64-cores.def
-@@ -49,6 +49,10 @@ AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AA
+@@ -44,13 +44,19 @@ AARCH64_CORE("cortex-a35", cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AA
+ AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")
+ AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
+ AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
++AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09")
+ AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, "0x53", "0x001")
+-AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800")
++AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, "0x51", "0x800")
AARCH64_CORE("thunderx", thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, "0x43", "0x0a1")
AARCH64_CORE("xgene1", xgene1, xgene1, 8A, AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000")
@@ -140,6 +260,10 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
/* V8 big.LITTLE implementations. */
AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03")
+ AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08.0xd03")
+-
++AARCH64_CORE("cortex-a73.cortex-a35", cortexa73cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09.0xd04")
++AARCH64_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09.0xd03")
--- a/src/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/src/gcc/config/aarch64/aarch64-cost-tables.h
@@ -127,6 +127,108 @@ const struct cpu_cost_table thunderx_extra_costs =
@@ -280,6 +404,27 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
CC_MODE (CC_NZ); /* Only N and Z bits of condition flags are valid. */
CC_MODE (CC_Z); /* Only Z bit of condition flags is valid. */
CC_MODE (CC_C); /* Only C bit of condition flags is valid. */
+--- a/src/gcc/config/aarch64/aarch64-option-extensions.def
++++ b/src/gcc/config/aarch64/aarch64-option-extensions.def
+@@ -39,8 +39,8 @@
+ that are required. Their order is not important. */
+
+ /* Enabling "fp" just enables "fp".
+- Disabling "fp" also disables "simd", "crypto". */
+-AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "fp")
++ Disabling "fp" also disables "simd", "crypto" and "fp16". */
++AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_F16, "fp")
+
+ /* Enabling "simd" also enables "fp".
+ Disabling "simd" also disables "crypto". */
+@@ -55,3 +55,7 @@ AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32")
+
+ /* Enabling or disabling "lse" only changes "lse". */
+ AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics")
++
++/* Enabling "fp16" also enables "fp".
++ Disabling "fp16" just disables "fp16". */
++AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fp16")
--- a/src/gcc/config/aarch64/aarch64-protos.h
+++ b/src/gcc/config/aarch64/aarch64-protos.h
@@ -178,6 +178,25 @@ struct cpu_branch_cost
@@ -316,8 +461,13 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
int memmov_cost;
int issue_rate;
unsigned int fusible_ops;
-@@ -287,9 +307,12 @@ bool aarch64_cannot_change_mode_class (machine_mode,
- enum reg_class);
+@@ -282,14 +302,14 @@ int aarch64_get_condition_code (rtx);
+ bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
+ int aarch64_branch_cost (bool, bool);
+ enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
+-bool aarch64_cannot_change_mode_class (machine_mode,
+- machine_mode,
+- enum reg_class);
bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
bool aarch64_constant_address_p (rtx);
+bool aarch64_emit_approx_div (rtx, rtx, rtx);
@@ -329,7 +479,15 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
bool aarch64_gen_movmemqi (rtx *);
bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
-@@ -335,11 +358,9 @@ machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
+@@ -298,6 +318,7 @@ bool aarch64_is_noplt_call_p (rtx);
+ bool aarch64_label_mentioned_p (rtx);
+ void aarch64_declare_function_name (FILE *, const char*, tree);
+ bool aarch64_legitimate_pic_operand_p (rtx);
++bool aarch64_mask_and_shift_for_ubfiz_p (machine_mode, rtx, rtx);
+ bool aarch64_modes_tieable_p (machine_mode mode1,
+ machine_mode mode2);
+ bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
+@@ -335,11 +356,9 @@ machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
machine_mode);
int aarch64_hard_regno_mode_ok (unsigned, machine_mode);
int aarch64_hard_regno_nregs (unsigned, machine_mode);
@@ -341,7 +499,15 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
rtx aarch64_mask_from_zextract_ops (rtx, rtx);
const char *aarch64_output_move_struct (rtx *operands);
rtx aarch64_return_addr (int, rtx);
-@@ -369,7 +390,6 @@ void aarch64_register_pragmas (void);
+@@ -352,7 +371,6 @@ unsigned aarch64_dbx_register_number (unsigned);
+ unsigned aarch64_trampoline_size (void);
+ void aarch64_asm_output_labelref (FILE *, const char *);
+ void aarch64_cpu_cpp_builtins (cpp_reader *);
+-void aarch64_elf_asm_named_section (const char *, unsigned, tree);
+ const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
+ const char * aarch64_output_probe_stack_range (rtx, rtx);
+ void aarch64_err_no_fpadvsimd (machine_mode, const char *);
+@@ -369,7 +387,6 @@ void aarch64_register_pragmas (void);
void aarch64_relayout_simd_types (void);
void aarch64_reset_previous_fndecl (void);
void aarch64_save_restore_target_globals (tree);
@@ -351,31 +517,307 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
void init_aarch64_simd_builtins (void);
--- a/src/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/src/gcc/config/aarch64/aarch64-simd-builtins.def
-@@ -449,3 +449,21 @@
+@@ -41,8 +41,8 @@
+
+ BUILTIN_VDC (COMBINE, combine, 0)
+ BUILTIN_VB (BINOP, pmul, 0)
+- BUILTIN_VALLF (BINOP, fmulx, 0)
+- BUILTIN_VDQF_DF (UNOP, sqrt, 2)
++ BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0)
++ BUILTIN_VHSDF_DF (UNOP, sqrt, 2)
+ BUILTIN_VD_BHSI (BINOP, addp, 0)
+ VAR1 (UNOP, addp, 0, di)
+ BUILTIN_VDQ_BHSI (UNOP, clrsb, 2)
+@@ -234,105 +234,145 @@
+ BUILTIN_VALL (UNOP, reduc_plus_scal_, 10)
+
+ /* Implemented by reduc_<maxmin_uns>_scal_<mode> (producing scalar). */
+- BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10)
+- BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10)
++ BUILTIN_VDQIF_F16 (UNOP, reduc_smax_scal_, 10)
++ BUILTIN_VDQIF_F16 (UNOP, reduc_smin_scal_, 10)
+ BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10)
+ BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10)
+- BUILTIN_VDQF (UNOP, reduc_smax_nan_scal_, 10)
+- BUILTIN_VDQF (UNOP, reduc_smin_nan_scal_, 10)
++ BUILTIN_VHSDF (UNOP, reduc_smax_nan_scal_, 10)
++ BUILTIN_VHSDF (UNOP, reduc_smin_nan_scal_, 10)
+
+- /* Implemented by <maxmin><mode>3.
++ /* Implemented by <maxmin_uns><mode>3.
+ smax variants map to fmaxnm,
+ smax_nan variants map to fmax. */
+ BUILTIN_VDQ_BHSI (BINOP, smax, 3)
+ BUILTIN_VDQ_BHSI (BINOP, smin, 3)
+ BUILTIN_VDQ_BHSI (BINOP, umax, 3)
+ BUILTIN_VDQ_BHSI (BINOP, umin, 3)
+- BUILTIN_VDQF (BINOP, smax_nan, 3)
+- BUILTIN_VDQF (BINOP, smin_nan, 3)
++ BUILTIN_VHSDF_DF (BINOP, smax_nan, 3)
++ BUILTIN_VHSDF_DF (BINOP, smin_nan, 3)
+
+- /* Implemented by <fmaxmin><mode>3. */
+- BUILTIN_VDQF (BINOP, fmax, 3)
+- BUILTIN_VDQF (BINOP, fmin, 3)
++ /* Implemented by <maxmin_uns><mode>3. */
++ BUILTIN_VHSDF_HSDF (BINOP, fmax, 3)
++ BUILTIN_VHSDF_HSDF (BINOP, fmin, 3)
+
+ /* Implemented by aarch64_<maxmin_uns>p<mode>. */
+ BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
+ BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
+ BUILTIN_VDQ_BHSI (BINOP, umaxp, 0)
+ BUILTIN_VDQ_BHSI (BINOP, uminp, 0)
+- BUILTIN_VDQF (BINOP, smaxp, 0)
+- BUILTIN_VDQF (BINOP, sminp, 0)
+- BUILTIN_VDQF (BINOP, smax_nanp, 0)
+- BUILTIN_VDQF (BINOP, smin_nanp, 0)
++ BUILTIN_VHSDF (BINOP, smaxp, 0)
++ BUILTIN_VHSDF (BINOP, sminp, 0)
++ BUILTIN_VHSDF (BINOP, smax_nanp, 0)
++ BUILTIN_VHSDF (BINOP, smin_nanp, 0)
+
+ /* Implemented by <frint_pattern><mode>2. */
+- BUILTIN_VDQF (UNOP, btrunc, 2)
+- BUILTIN_VDQF (UNOP, ceil, 2)
+- BUILTIN_VDQF (UNOP, floor, 2)
+- BUILTIN_VDQF (UNOP, nearbyint, 2)
+- BUILTIN_VDQF (UNOP, rint, 2)
+- BUILTIN_VDQF (UNOP, round, 2)
+- BUILTIN_VDQF_DF (UNOP, frintn, 2)
++ BUILTIN_VHSDF (UNOP, btrunc, 2)
++ BUILTIN_VHSDF (UNOP, ceil, 2)
++ BUILTIN_VHSDF (UNOP, floor, 2)
++ BUILTIN_VHSDF (UNOP, nearbyint, 2)
++ BUILTIN_VHSDF (UNOP, rint, 2)
++ BUILTIN_VHSDF (UNOP, round, 2)
++ BUILTIN_VHSDF_DF (UNOP, frintn, 2)
++
++ VAR1 (UNOP, btrunc, 2, hf)
++ VAR1 (UNOP, ceil, 2, hf)
++ VAR1 (UNOP, floor, 2, hf)
++ VAR1 (UNOP, frintn, 2, hf)
++ VAR1 (UNOP, nearbyint, 2, hf)
++ VAR1 (UNOP, rint, 2, hf)
++ VAR1 (UNOP, round, 2, hf)
+
+ /* Implemented by l<fcvt_pattern><su_optab><VQDF:mode><vcvt_target>2. */
++ VAR1 (UNOP, lbtruncv4hf, 2, v4hi)
++ VAR1 (UNOP, lbtruncv8hf, 2, v8hi)
+ VAR1 (UNOP, lbtruncv2sf, 2, v2si)
+ VAR1 (UNOP, lbtruncv4sf, 2, v4si)
+ VAR1 (UNOP, lbtruncv2df, 2, v2di)
+
++ VAR1 (UNOPUS, lbtruncuv4hf, 2, v4hi)
++ VAR1 (UNOPUS, lbtruncuv8hf, 2, v8hi)
+ VAR1 (UNOPUS, lbtruncuv2sf, 2, v2si)
+ VAR1 (UNOPUS, lbtruncuv4sf, 2, v4si)
+ VAR1 (UNOPUS, lbtruncuv2df, 2, v2di)
+
++ VAR1 (UNOP, lroundv4hf, 2, v4hi)
++ VAR1 (UNOP, lroundv8hf, 2, v8hi)
+ VAR1 (UNOP, lroundv2sf, 2, v2si)
+ VAR1 (UNOP, lroundv4sf, 2, v4si)
+ VAR1 (UNOP, lroundv2df, 2, v2di)
+- /* Implemented by l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2. */
++ /* Implemented by l<fcvt_pattern><su_optab><GPF_F16:mode><GPI:mode>2. */
++ BUILTIN_GPI_I16 (UNOP, lroundhf, 2)
+ VAR1 (UNOP, lroundsf, 2, si)
+ VAR1 (UNOP, lrounddf, 2, di)
+
++ VAR1 (UNOPUS, lrounduv4hf, 2, v4hi)
++ VAR1 (UNOPUS, lrounduv8hf, 2, v8hi)
+ VAR1 (UNOPUS, lrounduv2sf, 2, v2si)
+ VAR1 (UNOPUS, lrounduv4sf, 2, v4si)
+ VAR1 (UNOPUS, lrounduv2df, 2, v2di)
++ BUILTIN_GPI_I16 (UNOPUS, lrounduhf, 2)
+ VAR1 (UNOPUS, lroundusf, 2, si)
+ VAR1 (UNOPUS, lroundudf, 2, di)
+
++ VAR1 (UNOP, lceilv4hf, 2, v4hi)
++ VAR1 (UNOP, lceilv8hf, 2, v8hi)
+ VAR1 (UNOP, lceilv2sf, 2, v2si)
+ VAR1 (UNOP, lceilv4sf, 2, v4si)
+ VAR1 (UNOP, lceilv2df, 2, v2di)
++ BUILTIN_GPI_I16 (UNOP, lceilhf, 2)
+
++ VAR1 (UNOPUS, lceiluv4hf, 2, v4hi)
++ VAR1 (UNOPUS, lceiluv8hf, 2, v8hi)
+ VAR1 (UNOPUS, lceiluv2sf, 2, v2si)
+ VAR1 (UNOPUS, lceiluv4sf, 2, v4si)
+ VAR1 (UNOPUS, lceiluv2df, 2, v2di)
++ BUILTIN_GPI_I16 (UNOPUS, lceiluhf, 2)
+ VAR1 (UNOPUS, lceilusf, 2, si)
+ VAR1 (UNOPUS, lceiludf, 2, di)
+
++ VAR1 (UNOP, lfloorv4hf, 2, v4hi)
++ VAR1 (UNOP, lfloorv8hf, 2, v8hi)
+ VAR1 (UNOP, lfloorv2sf, 2, v2si)
+ VAR1 (UNOP, lfloorv4sf, 2, v4si)
+ VAR1 (UNOP, lfloorv2df, 2, v2di)
++ BUILTIN_GPI_I16 (UNOP, lfloorhf, 2)
+
++ VAR1 (UNOPUS, lflooruv4hf, 2, v4hi)
++ VAR1 (UNOPUS, lflooruv8hf, 2, v8hi)
+ VAR1 (UNOPUS, lflooruv2sf, 2, v2si)
+ VAR1 (UNOPUS, lflooruv4sf, 2, v4si)
+ VAR1 (UNOPUS, lflooruv2df, 2, v2di)
++ BUILTIN_GPI_I16 (UNOPUS, lflooruhf, 2)
+ VAR1 (UNOPUS, lfloorusf, 2, si)
+ VAR1 (UNOPUS, lfloorudf, 2, di)
+
++ VAR1 (UNOP, lfrintnv4hf, 2, v4hi)
++ VAR1 (UNOP, lfrintnv8hf, 2, v8hi)
+ VAR1 (UNOP, lfrintnv2sf, 2, v2si)
+ VAR1 (UNOP, lfrintnv4sf, 2, v4si)
+ VAR1 (UNOP, lfrintnv2df, 2, v2di)
++ BUILTIN_GPI_I16 (UNOP, lfrintnhf, 2)
+ VAR1 (UNOP, lfrintnsf, 2, si)
+ VAR1 (UNOP, lfrintndf, 2, di)
+
++ VAR1 (UNOPUS, lfrintnuv4hf, 2, v4hi)
++ VAR1 (UNOPUS, lfrintnuv8hf, 2, v8hi)
+ VAR1 (UNOPUS, lfrintnuv2sf, 2, v2si)
+ VAR1 (UNOPUS, lfrintnuv4sf, 2, v4si)
+ VAR1 (UNOPUS, lfrintnuv2df, 2, v2di)
++ BUILTIN_GPI_I16 (UNOPUS, lfrintnuhf, 2)
+ VAR1 (UNOPUS, lfrintnusf, 2, si)
+ VAR1 (UNOPUS, lfrintnudf, 2, di)
+
+ /* Implemented by <optab><fcvt_target><VDQF:mode>2. */
++ VAR1 (UNOP, floatv4hi, 2, v4hf)
++ VAR1 (UNOP, floatv8hi, 2, v8hf)
+ VAR1 (UNOP, floatv2si, 2, v2sf)
+ VAR1 (UNOP, floatv4si, 2, v4sf)
+ VAR1 (UNOP, floatv2di, 2, v2df)
+
++ VAR1 (UNOP, floatunsv4hi, 2, v4hf)
++ VAR1 (UNOP, floatunsv8hi, 2, v8hf)
+ VAR1 (UNOP, floatunsv2si, 2, v2sf)
+ VAR1 (UNOP, floatunsv4si, 2, v4sf)
+ VAR1 (UNOP, floatunsv2di, 2, v2df)
+@@ -352,19 +392,19 @@
+
+ /* Implemented by
+ aarch64_frecp<FRECP:frecp_suffix><mode>. */
+- BUILTIN_GPF (UNOP, frecpe, 0)
+- BUILTIN_GPF (BINOP, frecps, 0)
+- BUILTIN_GPF (UNOP, frecpx, 0)
++ BUILTIN_GPF_F16 (UNOP, frecpe, 0)
++ BUILTIN_GPF_F16 (UNOP, frecpx, 0)
+
+ BUILTIN_VDQ_SI (UNOP, urecpe, 0)
+
+- BUILTIN_VDQF (UNOP, frecpe, 0)
+- BUILTIN_VDQF (BINOP, frecps, 0)
++ BUILTIN_VHSDF (UNOP, frecpe, 0)
++ BUILTIN_VHSDF_HSDF (BINOP, frecps, 0)
+
+ /* Implemented by a mixture of abs2 patterns. Note the DImode builtin is
+ only ever used for the int64x1_t intrinsic, there is no scalar version. */
+ BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
+- BUILTIN_VDQF (UNOP, abs, 2)
++ BUILTIN_VHSDF (UNOP, abs, 2)
++ VAR1 (UNOP, abs, 2, hf)
+
+ BUILTIN_VQ_HSF (UNOP, vec_unpacks_hi_, 10)
+ VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
+@@ -381,7 +421,11 @@
+ BUILTIN_VALL_F16 (STORE1, st1, 0)
+
+ /* Implemented by fma<mode>4. */
+- BUILTIN_VDQF (TERNOP, fma, 4)
++ BUILTIN_VHSDF (TERNOP, fma, 4)
++ VAR1 (TERNOP, fma, 4, hf)
++ /* Implemented by fnma<mode>4. */
++ BUILTIN_VHSDF (TERNOP, fnma, 4)
++ VAR1 (TERNOP, fnma, 4, hf)
+
+ /* Implemented by aarch64_simd_bsl<mode>. */
+ BUILTIN_VDQQH (BSL_P, simd_bsl, 0)
+@@ -436,7 +480,7 @@
+ VAR1 (TERNOP, qtbx4, 0, v8qi)
+ VAR1 (TERNOP, qtbx4, 0, v16qi)
+
+- /* Builtins for ARMv8.1 Adv.SIMD instructions. */
++ /* Builtins for ARMv8.1-A Adv.SIMD instructions. */
+
+ /* Implemented by aarch64_sqrdml<SQRDMLH_AS:rdma_as>h<mode>. */
+ BUILTIN_VSDQ_HSI (TERNOP, sqrdmlah, 0)
+@@ -449,3 +493,60 @@
/* Implemented by aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode>. */
BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlah_laneq, 0)
BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_laneq, 0)
+
+ /* Implemented by <FCVT_F2FIXED/FIXED2F:fcvt_fixed_insn><*><*>3. */
-+ BUILTIN_VSDQ_SDI (SHIFTIMM, scvtf, 3)
-+ BUILTIN_VSDQ_SDI (FCVTIMM_SUS, ucvtf, 3)
-+ BUILTIN_VALLF (SHIFTIMM, fcvtzs, 3)
-+ BUILTIN_VALLF (SHIFTIMM_USS, fcvtzu, 3)
++ BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3)
++ BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3)
++ BUILTIN_VHSDF_HSDF (SHIFTIMM, fcvtzs, 3)
++ BUILTIN_VHSDF_HSDF (SHIFTIMM_USS, fcvtzu, 3)
++ VAR1 (SHIFTIMM, scvtfsi, 3, hf)
++ VAR1 (SHIFTIMM, scvtfdi, 3, hf)
++ VAR1 (FCVTIMM_SUS, ucvtfsi, 3, hf)
++ VAR1 (FCVTIMM_SUS, ucvtfdi, 3, hf)
++ BUILTIN_GPI (SHIFTIMM, fcvtzshf, 3)
++ BUILTIN_GPI (SHIFTIMM_USS, fcvtzuhf, 3)
+
+ /* Implemented by aarch64_rsqrte<mode>. */
-+ BUILTIN_VALLF (UNOP, rsqrte, 0)
++ BUILTIN_VHSDF_HSDF (UNOP, rsqrte, 0)
+
+ /* Implemented by aarch64_rsqrts<mode>. */
-+ BUILTIN_VALLF (BINOP, rsqrts, 0)
++ BUILTIN_VHSDF_HSDF (BINOP, rsqrts, 0)
+
+ /* Implemented by fabd<mode>3. */
-+ BUILTIN_VALLF (BINOP, fabd, 3)
++ BUILTIN_VHSDF_HSDF (BINOP, fabd, 3)
+
+ /* Implemented by aarch64_faddp<mode>. */
-+ BUILTIN_VDQF (BINOP, faddp, 0)
++ BUILTIN_VHSDF (BINOP, faddp, 0)
++
++ /* Implemented by aarch64_cm<optab><mode>. */
++ BUILTIN_VHSDF_HSDF (BINOP_USS, cmeq, 0)
++ BUILTIN_VHSDF_HSDF (BINOP_USS, cmge, 0)
++ BUILTIN_VHSDF_HSDF (BINOP_USS, cmgt, 0)
++ BUILTIN_VHSDF_HSDF (BINOP_USS, cmle, 0)
++ BUILTIN_VHSDF_HSDF (BINOP_USS, cmlt, 0)
++
++ /* Implemented by neg<mode>2. */
++ BUILTIN_VHSDF_HSDF (UNOP, neg, 2)
++
++ /* Implemented by aarch64_fac<optab><mode>. */
++ BUILTIN_VHSDF_HSDF (BINOP_USS, faclt, 0)
++ BUILTIN_VHSDF_HSDF (BINOP_USS, facle, 0)
++ BUILTIN_VHSDF_HSDF (BINOP_USS, facgt, 0)
++ BUILTIN_VHSDF_HSDF (BINOP_USS, facge, 0)
++
++ /* Implemented by sqrt<mode>2. */
++ VAR1 (UNOP, sqrt, 2, hf)
++
++ /* Implemented by <optab><mode>hf2. */
++ VAR1 (UNOP, floatdi, 2, hf)
++ VAR1 (UNOP, floatsi, 2, hf)
++ VAR1 (UNOP, floathi, 2, hf)
++ VAR1 (UNOPUS, floatunsdi, 2, hf)
++ VAR1 (UNOPUS, floatunssi, 2, hf)
++ VAR1 (UNOPUS, floatunshi, 2, hf)
++ BUILTIN_GPI_I16 (UNOP, fix_trunchf, 2)
++ BUILTIN_GPI (UNOP, fix_truncsf, 2)
++ BUILTIN_GPI (UNOP, fix_truncdf, 2)
++ BUILTIN_GPI_I16 (UNOPUS, fixuns_trunchf, 2)
++ BUILTIN_GPI (UNOPUS, fixuns_truncsf, 2)
++ BUILTIN_GPI (UNOPUS, fixuns_truncdf, 2)
+\ No newline at end of file
--- a/src/gcc/config/aarch64/aarch64-simd.md
+++ b/src/gcc/config/aarch64/aarch64-simd.md
-@@ -371,18 +371,18 @@
+@@ -351,7 +351,7 @@
+ operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ return "<f>mul\\t%0.<Vtype>, %3.<Vtype>, %1.<Vetype>[%2]";
+ }
+- [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
++ [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
+ )
+
+ (define_insn "*aarch64_mul3_elt_<vswap_width_name><mode>"
+@@ -371,33 +371,33 @@
[(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
)
@@ -395,23 +837,38 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
- "fmul\\t%0.2d, %1.2d, %2.d[0]"
- [(set_attr "type" "neon_fp_mul_d_scalar_q")]
+ "<f>mul\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]";
-+ [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
++ [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
)
-(define_insn "aarch64_rsqrte_<mode>2"
+- [(set (match_operand:VALLF 0 "register_operand" "=w")
+- (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")]
+(define_insn "aarch64_rsqrte<mode>"
- [(set (match_operand:VALLF 0 "register_operand" "=w")
- (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")]
++ [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
++ (unspec:VHSDF_HSDF [(match_operand:VHSDF_HSDF 1 "register_operand" "w")]
UNSPEC_RSQRTE))]
-@@ -390,7 +390,7 @@
+ "TARGET_SIMD"
"frsqrte\\t%<v>0<Vmtype>, %<v>1<Vmtype>"
- [(set_attr "type" "neon_fp_rsqrte_<Vetype><q>")])
+- [(set_attr "type" "neon_fp_rsqrte_<Vetype><q>")])
++ [(set_attr "type" "neon_fp_rsqrte_<stype><q>")])
-(define_insn "aarch64_rsqrts_<mode>3"
+- [(set (match_operand:VALLF 0 "register_operand" "=w")
+- (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
+- (match_operand:VALLF 2 "register_operand" "w")]
+- UNSPEC_RSQRTS))]
+(define_insn "aarch64_rsqrts<mode>"
++ [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
++ (unspec:VHSDF_HSDF [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
++ (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
++ UNSPEC_RSQRTS))]
+ "TARGET_SIMD"
+ "frsqrts\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
+- [(set_attr "type" "neon_fp_rsqrts_<Vetype><q>")])
++ [(set_attr "type" "neon_fp_rsqrts_<stype><q>")])
+
+ (define_expand "rsqrt<mode>2"
[(set (match_operand:VALLF 0 "register_operand" "=w")
- (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
- (match_operand:VALLF 2 "register_operand" "w")]
@@ -405,7 +405,7 @@
UNSPEC_RSQRT))]
"TARGET_SIMD"
@@ -421,7 +878,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
DONE;
})
-@@ -474,23 +474,14 @@
+@@ -474,24 +474,15 @@
[(set_attr "type" "neon_arith_acc<q>")]
)
@@ -430,7 +887,13 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
- (abs:VDQF (minus:VDQF
- (match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w"))))]
-- "TARGET_SIMD"
++(define_insn "fabd<mode>3"
++ [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
++ (abs:VHSDF_HSDF
++ (minus:VHSDF_HSDF
++ (match_operand:VHSDF_HSDF 1 "register_operand" "w")
++ (match_operand:VHSDF_HSDF 2 "register_operand" "w"))))]
+ "TARGET_SIMD"
- "fabd\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
- [(set_attr "type" "neon_fp_abd_<Vetype><q>")]
-)
@@ -440,27 +903,129 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
- (abs:GPF (minus:GPF
- (match_operand:GPF 1 "register_operand" "w")
- (match_operand:GPF 2 "register_operand" "w"))))]
-+(define_insn "fabd<mode>3"
-+ [(set (match_operand:VALLF 0 "register_operand" "=w")
-+ (abs:VALLF
-+ (minus:VALLF
-+ (match_operand:VALLF 1 "register_operand" "w")
-+ (match_operand:VALLF 2 "register_operand" "w"))))]
- "TARGET_SIMD"
+- "TARGET_SIMD"
- "fabd\t%<s>0, %<s>1, %<s>2"
+- [(set_attr "type" "neon_fp_abd_<Vetype><q>")]
+ "fabd\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
- [(set_attr "type" "neon_fp_abd_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_abd_<stype><q>")]
)
-@@ -1509,7 +1500,19 @@
- [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
+ (define_insn "and<mode>3"
+@@ -555,6 +546,49 @@
+ [(set_attr "type" "neon_from_gp<q>, neon_ins<q>, neon_load1_1reg<q>")]
+ )
+
++(define_insn "*aarch64_simd_vec_copy_lane<mode>"
++ [(set (match_operand:VALL 0 "register_operand" "=w")
++ (vec_merge:VALL
++ (vec_duplicate:VALL
++ (vec_select:<VEL>
++ (match_operand:VALL 3 "register_operand" "w")
++ (parallel
++ [(match_operand:SI 4 "immediate_operand" "i")])))
++ (match_operand:VALL 1 "register_operand" "0")
++ (match_operand:SI 2 "immediate_operand" "i")))]
++ "TARGET_SIMD"
++ {
++ int elt = ENDIAN_LANE_N (<MODE>mode, exact_log2 (INTVAL (operands[2])));
++ operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
++ operands[4] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[4])));
++
++ return "ins\t%0.<Vetype>[%p2], %3.<Vetype>[%4]";
++ }
++ [(set_attr "type" "neon_ins<q>")]
++)
++
++(define_insn "*aarch64_simd_vec_copy_lane_<vswap_width_name><mode>"
++ [(set (match_operand:VALL 0 "register_operand" "=w")
++ (vec_merge:VALL
++ (vec_duplicate:VALL
++ (vec_select:<VEL>
++ (match_operand:<VSWAP_WIDTH> 3 "register_operand" "w")
++ (parallel
++ [(match_operand:SI 4 "immediate_operand" "i")])))
++ (match_operand:VALL 1 "register_operand" "0")
++ (match_operand:SI 2 "immediate_operand" "i")))]
++ "TARGET_SIMD"
++ {
++ int elt = ENDIAN_LANE_N (<MODE>mode, exact_log2 (INTVAL (operands[2])));
++ operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
++ operands[4] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
++ INTVAL (operands[4])));
++
++ return "ins\t%0.<Vetype>[%p2], %3.<Vetype>[%4]";
++ }
++ [(set_attr "type" "neon_ins<q>")]
++)
++
+ (define_insn "aarch64_simd_lshr<mode>"
+ [(set (match_operand:VDQ_I 0 "register_operand" "=w")
+ (lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
+@@ -1071,10 +1105,10 @@
+
+ ;; Pairwise FP Max/Min operations.
+ (define_insn "aarch64_<maxmin_uns>p<mode>"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
+- (match_operand:VDQF 2 "register_operand" "w")]
+- FMAXMINV))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:VHSDF 2 "register_operand" "w")]
++ FMAXMINV))]
+ "TARGET_SIMD"
+ "<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+ [(set_attr "type" "neon_minmax<q>")]
+@@ -1483,65 +1517,77 @@
+ ;; FP arithmetic operations.
+
+ (define_insn "add<mode>3"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (plus:VDQF (match_operand:VDQF 1 "register_operand" "w")
+- (match_operand:VDQF 2 "register_operand" "w")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:VHSDF 2 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "fadd\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+- [(set_attr "type" "neon_fp_addsub_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_addsub_<stype><q>")]
+ )
+
+ (define_insn "sub<mode>3"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (minus:VDQF (match_operand:VDQF 1 "register_operand" "w")
+- (match_operand:VDQF 2 "register_operand" "w")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (minus:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:VHSDF 2 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "fsub\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+- [(set_attr "type" "neon_fp_addsub_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_addsub_<stype><q>")]
+ )
+
+ (define_insn "mul<mode>3"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (mult:VDQF (match_operand:VDQF 1 "register_operand" "w")
+- (match_operand:VDQF 2 "register_operand" "w")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (mult:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:VHSDF 2 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "fmul\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+- [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_mul_<stype><q>")]
)
-(define_insn "div<mode>3"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (div:VDQF (match_operand:VDQF 1 "register_operand" "w")
+- (match_operand:VDQF 2 "register_operand" "w")))]
+(define_expand "div<mode>3"
-+ [(set (match_operand:VDQF 0 "register_operand")
-+ (div:VDQF (match_operand:VDQF 1 "general_operand")
-+ (match_operand:VDQF 2 "register_operand")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:VHSDF 2 "register_operand" "w")))]
+ "TARGET_SIMD"
+{
+ if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
@@ -470,10 +1035,54 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
+})
+
+(define_insn "*div<mode>3"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (div:VDQF (match_operand:VDQF 1 "register_operand" "w")
- (match_operand:VDQF 2 "register_operand" "w")))]
-@@ -1579,16 +1582,16 @@
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:VHSDF 2 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "fdiv\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+- [(set_attr "type" "neon_fp_div_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_div_<stype><q>")]
+ )
+
+ (define_insn "neg<mode>2"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (neg:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (neg:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "fneg\\t%0.<Vtype>, %1.<Vtype>"
+- [(set_attr "type" "neon_fp_neg_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_neg_<stype><q>")]
+ )
+
+ (define_insn "abs<mode>2"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (abs:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "fabs\\t%0.<Vtype>, %1.<Vtype>"
+- [(set_attr "type" "neon_fp_abs_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_abs_<stype><q>")]
+ )
+
+ (define_insn "fma<mode>4"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (fma:VDQF (match_operand:VDQF 1 "register_operand" "w")
+- (match_operand:VDQF 2 "register_operand" "w")
+- (match_operand:VDQF 3 "register_operand" "0")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (fma:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:VHSDF 2 "register_operand" "w")
++ (match_operand:VHSDF 3 "register_operand" "0")))]
+ "TARGET_SIMD"
+ "fmla\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+- [(set_attr "type" "neon_fp_mla_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_mla_<stype><q>")]
+ )
+
+ (define_insn "*aarch64_fma4_elt<mode>"
+@@ -1579,16 +1625,16 @@
[(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
)
@@ -495,11 +1104,35 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
- "fmla\\t%0.2d, %2.2d, %1.2d[0]"
- [(set_attr "type" "neon_fp_mla_d_scalar_q")]
+ "fmla\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]"
-+ [(set_attr "type" "neon<fp>_mla_<Vetype>_scalar<q>")]
++ [(set_attr "type" "neon<fp>_mla_<stype>_scalar<q>")]
)
(define_insn "*aarch64_fma4_elt_to_64v2df"
-@@ -1656,17 +1659,17 @@
+@@ -1608,15 +1654,15 @@
+ )
+
+ (define_insn "fnma<mode>4"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (fma:VDQF
+- (match_operand:VDQF 1 "register_operand" "w")
+- (neg:VDQF
+- (match_operand:VDQF 2 "register_operand" "w"))
+- (match_operand:VDQF 3 "register_operand" "0")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (fma:VHSDF
++ (match_operand:VHSDF 1 "register_operand" "w")
++ (neg:VHSDF
++ (match_operand:VHSDF 2 "register_operand" "w"))
++ (match_operand:VHSDF 3 "register_operand" "0")))]
+ "TARGET_SIMD"
+- "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+- [(set_attr "type" "neon_fp_mla_<Vetype><q>")]
++ "fmls\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
++ [(set_attr "type" "neon_fp_mla_<stype><q>")]
+ )
+
+ (define_insn "*aarch64_fnma4_elt<mode>"
+@@ -1656,17 +1702,17 @@
[(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
)
@@ -511,9 +1144,6 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
- (vec_duplicate:V2DF
- (match_operand:DF 1 "register_operand" "w"))
- (match_operand:V2DF 3 "register_operand" "0")))]
-- "TARGET_SIMD"
-- "fmls\\t%0.2d, %2.2d, %1.2d[0]"
-- [(set_attr "type" "neon_fp_mla_d_scalar_q")]
+(define_insn "*aarch64_fnma4_elt_from_dup<mode>"
+ [(set (match_operand:VMUL 0 "register_operand" "=w")
+ (fma:VMUL
@@ -522,42 +1152,201 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
+ (vec_duplicate:VMUL
+ (match_operand:<VEL> 1 "register_operand" "w"))
+ (match_operand:VMUL 3 "register_operand" "0")))]
-+ "TARGET_SIMD"
+ "TARGET_SIMD"
+- "fmls\\t%0.2d, %2.2d, %1.2d[0]"
+- [(set_attr "type" "neon_fp_mla_d_scalar_q")]
+ "fmls\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]"
-+ [(set_attr "type" "neon<fp>_mla_<Vetype>_scalar<q>")]
++ [(set_attr "type" "neon<fp>_mla_<stype>_scalar<q>")]
)
(define_insn "*aarch64_fnma4_elt_to_64v2df"
-@@ -1778,6 +1781,28 @@
+@@ -1689,24 +1735,50 @@
+ ;; Vector versions of the floating-point frint patterns.
+ ;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
+ (define_insn "<frint_pattern><mode>2"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
+- FRINT))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
++ FRINT))]
+ "TARGET_SIMD"
+ "frint<frint_suffix>\\t%0.<Vtype>, %1.<Vtype>"
+- [(set_attr "type" "neon_fp_round_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_round_<stype><q>")]
+ )
+
+ ;; Vector versions of the fcvt standard patterns.
+ ;; Expands to lbtrunc, lround, lceil, lfloor
+-(define_insn "l<fcvt_pattern><su_optab><VDQF:mode><fcvt_target>2"
++(define_insn "l<fcvt_pattern><su_optab><VHSDF:mode><fcvt_target>2"
+ [(set (match_operand:<FCVT_TARGET> 0 "register_operand" "=w")
+ (FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
+- [(match_operand:VDQF 1 "register_operand" "w")]
++ [(match_operand:VHSDF 1 "register_operand" "w")]
+ FCVT)))]
+ "TARGET_SIMD"
+ "fcvt<frint_suffix><su>\\t%0.<Vtype>, %1.<Vtype>"
+- [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_to_int_<stype><q>")]
++)
++
++;; HF Scalar variants of related SIMD instructions.
++(define_insn "l<fcvt_pattern><su_optab>hfhi2"
++ [(set (match_operand:HI 0 "register_operand" "=w")
++ (FIXUORS:HI (unspec:HF [(match_operand:HF 1 "register_operand" "w")]
++ FCVT)))]
++ "TARGET_SIMD_F16INST"
++ "fcvt<frint_suffix><su>\t%h0, %h1"
++ [(set_attr "type" "neon_fp_to_int_s")]
++)
++
++(define_insn "<optab>_trunchfhi2"
++ [(set (match_operand:HI 0 "register_operand" "=w")
++ (FIXUORS:HI (match_operand:HF 1 "register_operand" "w")))]
++ "TARGET_SIMD_F16INST"
++ "fcvtz<su>\t%h0, %h1"
++ [(set_attr "type" "neon_fp_to_int_s")]
++)
++
++(define_insn "<optab>hihf2"
++ [(set (match_operand:HF 0 "register_operand" "=w")
++ (FLOATUORS:HF (match_operand:HI 1 "register_operand" "w")))]
++ "TARGET_SIMD_F16INST"
++ "<su_optab>cvtf\t%h0, %h1"
++ [(set_attr "type" "neon_int_to_fp_s")]
+ )
+
+ (define_insn "*aarch64_fcvt<su_optab><VDQF:mode><fcvt_target>2_mult"
+@@ -1729,36 +1801,36 @@
+ [(set_attr "type" "neon_fp_to_int_<Vetype><q>")]
+ )
+
+-(define_expand "<optab><VDQF:mode><fcvt_target>2"
++(define_expand "<optab><VHSDF:mode><fcvt_target>2"
+ [(set (match_operand:<FCVT_TARGET> 0 "register_operand")
+ (FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
+- [(match_operand:VDQF 1 "register_operand")]
+- UNSPEC_FRINTZ)))]
++ [(match_operand:VHSDF 1 "register_operand")]
++ UNSPEC_FRINTZ)))]
+ "TARGET_SIMD"
+ {})
+
+-(define_expand "<fix_trunc_optab><VDQF:mode><fcvt_target>2"
++(define_expand "<fix_trunc_optab><VHSDF:mode><fcvt_target>2"
+ [(set (match_operand:<FCVT_TARGET> 0 "register_operand")
+ (FIXUORS:<FCVT_TARGET> (unspec:<FCVT_TARGET>
+- [(match_operand:VDQF 1 "register_operand")]
+- UNSPEC_FRINTZ)))]
++ [(match_operand:VHSDF 1 "register_operand")]
++ UNSPEC_FRINTZ)))]
+ "TARGET_SIMD"
+ {})
+
+-(define_expand "ftrunc<VDQF:mode>2"
+- [(set (match_operand:VDQF 0 "register_operand")
+- (unspec:VDQF [(match_operand:VDQF 1 "register_operand")]
+- UNSPEC_FRINTZ))]
++(define_expand "ftrunc<VHSDF:mode>2"
++ [(set (match_operand:VHSDF 0 "register_operand")
++ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")]
++ UNSPEC_FRINTZ))]
+ "TARGET_SIMD"
+ {})
+
+-(define_insn "<optab><fcvt_target><VDQF:mode>2"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (FLOATUORS:VDQF
++(define_insn "<optab><fcvt_target><VHSDF:mode>2"
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (FLOATUORS:VHSDF
+ (match_operand:<FCVT_TARGET> 1 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "<su_optab>cvtf\\t%0.<Vtype>, %1.<Vtype>"
+- [(set_attr "type" "neon_int_to_fp_<Vetype><q>")]
++ [(set_attr "type" "neon_int_to_fp_<stype><q>")]
+ )
+
+ ;; Conversions between vectors of floats and doubles.
+@@ -1778,6 +1850,30 @@
[(set_attr "type" "neon_fp_cvt_widen_s")]
)
+;; Convert between fixed-point and floating-point (vector modes)
+
-+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><VDQF:mode>3"
-+ [(set (match_operand:<VDQF:FCVT_TARGET> 0 "register_operand" "=w")
-+ (unspec:<VDQF:FCVT_TARGET> [(match_operand:VDQF 1 "register_operand" "w")
-+ (match_operand:SI 2 "immediate_operand" "i")]
++(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><VHSDF:mode>3"
++ [(set (match_operand:<VHSDF:FCVT_TARGET> 0 "register_operand" "=w")
++ (unspec:<VHSDF:FCVT_TARGET>
++ [(match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:SI 2 "immediate_operand" "i")]
+ FCVT_F2FIXED))]
+ "TARGET_SIMD"
+ "<FCVT_F2FIXED:fcvt_fixed_insn>\t%<v>0<Vmtype>, %<v>1<Vmtype>, #%2"
-+ [(set_attr "type" "neon_fp_to_int_<VDQF:Vetype><q>")]
++ [(set_attr "type" "neon_fp_to_int_<VHSDF:stype><q>")]
+)
+
-+(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_SDI:mode>3"
-+ [(set (match_operand:<VDQ_SDI:FCVT_TARGET> 0 "register_operand" "=w")
-+ (unspec:<VDQ_SDI:FCVT_TARGET> [(match_operand:VDQ_SDI 1 "register_operand" "w")
-+ (match_operand:SI 2 "immediate_operand" "i")]
++(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><VDQ_HSDI:mode>3"
++ [(set (match_operand:<VDQ_HSDI:FCVT_TARGET> 0 "register_operand" "=w")
++ (unspec:<VDQ_HSDI:FCVT_TARGET>
++ [(match_operand:VDQ_HSDI 1 "register_operand" "w")
++ (match_operand:SI 2 "immediate_operand" "i")]
+ FCVT_FIXED2F))]
+ "TARGET_SIMD"
+ "<FCVT_FIXED2F:fcvt_fixed_insn>\t%<v>0<Vmtype>, %<v>1<Vmtype>, #%2"
-+ [(set_attr "type" "neon_int_to_fp_<VDQ_SDI:Vetype><q>")]
++ [(set_attr "type" "neon_int_to_fp_<VDQ_HSDI:stype><q>")]
+)
+
;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns
;; is inconsistent with vector ordering elsewhere in the compiler, in that
;; the meaning of HI and LO changes depending on the target endianness.
-@@ -1979,17 +2004,14 @@
+@@ -1934,33 +2030,25 @@
+ ;; NaNs.
+
+ (define_insn "<su><maxmin><mode>3"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (FMAXMIN:VDQF (match_operand:VDQF 1 "register_operand" "w")
+- (match_operand:VDQF 2 "register_operand" "w")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (FMAXMIN:VHSDF (match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:VHSDF 2 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "f<maxmin>nm\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+- [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_minmax_<stype><q>")]
+ )
+
++;; Vector forms for fmax, fmin, fmaxnm, fminnm.
++;; fmaxnm and fminnm are used for the fmax<mode>3 standard pattern names,
++;; which implement the IEEE fmax ()/fmin () functions.
+ (define_insn "<maxmin_uns><mode>3"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
+- (match_operand:VDQF 2 "register_operand" "w")]
+- FMAXMIN_UNS))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:VHSDF 2 "register_operand" "w")]
++ FMAXMIN_UNS))]
+ "TARGET_SIMD"
+ "<maxmin_uns_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+- [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
+-)
+-
+-;; Auto-vectorized forms for the IEEE-754 fmax()/fmin() functions
+-(define_insn "<fmaxmin><mode>3"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
+- (match_operand:VDQF 2 "register_operand" "w")]
+- FMAXMIN))]
+- "TARGET_SIMD"
+- "<fmaxmin_op>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+- [(set_attr "type" "neon_fp_minmax_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_minmax_<stype><q>")]
+ )
+
+ ;; 'across lanes' add.
+@@ -1979,17 +2067,14 @@
}
)
@@ -573,17 +1362,17 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
- DONE;
- }
+(define_insn "aarch64_faddp<mode>"
-+ [(set (match_operand:VDQF 0 "register_operand" "=w")
-+ (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
-+ (match_operand:VDQF 2 "register_operand" "w")]
-+ UNSPEC_FADDV))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
++ (match_operand:VHSDF 2 "register_operand" "w")]
++ UNSPEC_FADDV))]
+ "TARGET_SIMD"
+ "faddp\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
-+ [(set_attr "type" "neon_fp_reduc_add_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_reduc_add_<stype><q>")]
)
(define_insn "aarch64_reduc_plus_internal<mode>"
-@@ -2010,24 +2032,15 @@
+@@ -2010,24 +2095,15 @@
[(set_attr "type" "neon_reduc_add")]
)
@@ -611,7 +1400,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
(define_expand "reduc_plus_scal_v4sf"
[(set (match_operand:SF 0 "register_operand")
(unspec:V4SF [(match_operand:V4SF 1 "register_operand")]
-@@ -2036,8 +2049,8 @@
+@@ -2036,8 +2112,8 @@
{
rtx elt = GEN_INT (ENDIAN_LANE_N (V4SFmode, 0));
rtx scratch = gen_reg_rtx (V4SFmode);
@@ -622,7 +1411,35 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
emit_insn (gen_aarch64_get_lanev4sf (operands[0], scratch, elt));
DONE;
})
-@@ -2635,7 +2648,7 @@
+@@ -2072,8 +2148,8 @@
+ ;; gimple_fold'd to the REDUC_(MAX|MIN)_EXPR tree code. (This is FP smax/smin).
+ (define_expand "reduc_<maxmin_uns>_scal_<mode>"
+ [(match_operand:<VEL> 0 "register_operand")
+- (unspec:VDQF [(match_operand:VDQF 1 "register_operand")]
+- FMAXMINV)]
++ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")]
++ FMAXMINV)]
+ "TARGET_SIMD"
+ {
+ rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0));
+@@ -2120,12 +2196,12 @@
+ )
+
+ (define_insn "aarch64_reduc_<maxmin_uns>_internal<mode>"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
+- FMAXMINV))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
++ FMAXMINV))]
+ "TARGET_SIMD"
+ "<maxmin_uns_op><vp>\\t%<Vetype>0, %1.<Vtype>"
+- [(set_attr "type" "neon_fp_reduc_minmax_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_reduc_minmax_<stype><q>")]
+ )
+
+ ;; aarch64_simd_bsl may compile to any of bsl/bif/bit depending on register
+@@ -2635,7 +2711,7 @@
(define_insn "*aarch64_combinez<mode>"
[(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w")
(vec_concat:<VDBL>
@@ -631,7 +1448,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
(match_operand:VD_BHSI 2 "aarch64_simd_imm_zero" "Dz,Dz,Dz")))]
"TARGET_SIMD && !BYTES_BIG_ENDIAN"
"@
-@@ -2651,7 +2664,7 @@
+@@ -2651,7 +2727,7 @@
[(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w")
(vec_concat:<VDBL>
(match_operand:VD_BHSI 2 "aarch64_simd_imm_zero" "Dz,Dz,Dz")
@@ -640,14 +1457,146 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
"TARGET_SIMD && BYTES_BIG_ENDIAN"
"@
mov\\t%0.8b, %1.8b
-@@ -4297,7 +4310,16 @@
+@@ -2994,13 +3070,14 @@
+ ;; fmulx.
+
+ (define_insn "aarch64_fmulx<mode>"
+- [(set (match_operand:VALLF 0 "register_operand" "=w")
+- (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
+- (match_operand:VALLF 2 "register_operand" "w")]
+- UNSPEC_FMULX))]
++ [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
++ (unspec:VHSDF_HSDF
++ [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
++ (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
++ UNSPEC_FMULX))]
+ "TARGET_SIMD"
+ "fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
+- [(set_attr "type" "neon_fp_mul_<Vetype>")]
++ [(set_attr "type" "neon_fp_mul_<stype>")]
+ )
+
+ ;; vmulxq_lane_f32, and vmulx_laneq_f32
+@@ -3042,20 +3119,18 @@
+ [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
+ )
+
+-;; vmulxq_lane_f64
++;; vmulxq_lane
+
+-(define_insn "*aarch64_mulx_elt_to_64v2df"
+- [(set (match_operand:V2DF 0 "register_operand" "=w")
+- (unspec:V2DF
+- [(match_operand:V2DF 1 "register_operand" "w")
+- (vec_duplicate:V2DF
+- (match_operand:DF 2 "register_operand" "w"))]
++(define_insn "*aarch64_mulx_elt_from_dup<mode>"
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (unspec:VHSDF
++ [(match_operand:VHSDF 1 "register_operand" "w")
++ (vec_duplicate:VHSDF
++ (match_operand:<VEL> 2 "register_operand" "w"))]
+ UNSPEC_FMULX))]
+ "TARGET_SIMD"
+- {
+- return "fmulx\t%0.2d, %1.2d, %2.d[0]";
+- }
+- [(set_attr "type" "neon_fp_mul_d_scalar_q")]
++ "fmulx\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]";
++ [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
+ )
+
+ ;; vmulxs_lane_f32, vmulxs_laneq_f32
+@@ -3937,15 +4012,12 @@
+ "aarch64_simd_shift_imm_bitsize_<ve_mode>" "i")]
+ VSHLL))]
+ "TARGET_SIMD"
+- "*
+- int bit_width = GET_MODE_UNIT_SIZE (<MODE>mode) * BITS_PER_UNIT;
+- if (INTVAL (operands[2]) == bit_width)
+ {
+- return \"shll\\t%0.<Vwtype>, %1.<Vtype>, %2\";
++ if (INTVAL (operands[2]) == GET_MODE_UNIT_BITSIZE (<MODE>mode))
++ return "shll\\t%0.<Vwtype>, %1.<Vtype>, %2";
++ else
++ return "<sur>shll\\t%0.<Vwtype>, %1.<Vtype>, %2";
+ }
+- else {
+- return \"<sur>shll\\t%0.<Vwtype>, %1.<Vtype>, %2\";
+- }"
+ [(set_attr "type" "neon_shift_imm_long")]
+ )
+
+@@ -3957,15 +4029,12 @@
+ (match_operand:SI 2 "immediate_operand" "i")]
+ VSHLL))]
+ "TARGET_SIMD"
+- "*
+- int bit_width = GET_MODE_UNIT_SIZE (<MODE>mode) * BITS_PER_UNIT;
+- if (INTVAL (operands[2]) == bit_width)
+ {
+- return \"shll2\\t%0.<Vwtype>, %1.<Vtype>, %2\";
++ if (INTVAL (operands[2]) == GET_MODE_UNIT_BITSIZE (<MODE>mode))
++ return "shll2\\t%0.<Vwtype>, %1.<Vtype>, %2";
++ else
++ return "<sur>shll2\\t%0.<Vwtype>, %1.<Vtype>, %2";
+ }
+- else {
+- return \"<sur>shll2\\t%0.<Vwtype>, %1.<Vtype>, %2\";
+- }"
+ [(set_attr "type" "neon_shift_imm_long")]
+ )
+
+@@ -4246,30 +4315,32 @@
+ [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w,w")
+ (neg:<V_cmp_result>
+ (COMPARISONS:<V_cmp_result>
+- (match_operand:VALLF 1 "register_operand" "w,w")
+- (match_operand:VALLF 2 "aarch64_simd_reg_or_zero" "w,YDz")
++ (match_operand:VHSDF_HSDF 1 "register_operand" "w,w")
++ (match_operand:VHSDF_HSDF 2 "aarch64_simd_reg_or_zero" "w,YDz")
+ )))]
+ "TARGET_SIMD"
+ "@
+ fcm<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>
+ fcm<optab>\t%<v>0<Vmtype>, %<v>1<Vmtype>, 0"
+- [(set_attr "type" "neon_fp_compare_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_compare_<stype><q>")]
+ )
+
+ ;; fac(ge|gt)
+ ;; Note we can also handle what would be fac(le|lt) by
+ ;; generating fac(ge|gt).
+
+-(define_insn "*aarch64_fac<optab><mode>"
++(define_insn "aarch64_fac<optab><mode>"
+ [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w")
+ (neg:<V_cmp_result>
+ (FAC_COMPARISONS:<V_cmp_result>
+- (abs:VALLF (match_operand:VALLF 1 "register_operand" "w"))
+- (abs:VALLF (match_operand:VALLF 2 "register_operand" "w"))
++ (abs:VHSDF_HSDF
++ (match_operand:VHSDF_HSDF 1 "register_operand" "w"))
++ (abs:VHSDF_HSDF
++ (match_operand:VHSDF_HSDF 2 "register_operand" "w"))
+ )))]
+ "TARGET_SIMD"
+ "fac<n_optab>\t%<v>0<Vmtype>, %<v><cmp_1><Vmtype>, %<v><cmp_2><Vmtype>"
+- [(set_attr "type" "neon_fp_compare_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_compare_<stype><q>")]
+ )
+
+ ;; addp
+@@ -4297,12 +4368,21 @@
;; sqrt
-(define_insn "sqrt<mode>2"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (sqrt:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
+(define_expand "sqrt<mode>2"
-+ [(set (match_operand:VDQF 0 "register_operand")
-+ (sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (sqrt:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
+ "TARGET_SIMD"
+{
+ if (aarch64_emit_approx_sqrt (operands[0], operands[1], false))
@@ -655,10 +1604,16 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
+})
+
+(define_insn "*sqrt<mode>2"
- [(set (match_operand:VDQF 0 "register_operand" "=w")
- (sqrt:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (sqrt:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
"TARGET_SIMD"
-@@ -4652,7 +4674,7 @@
+ "fsqrt\\t%0.<Vtype>, %1.<Vtype>"
+- [(set_attr "type" "neon_fp_sqrt_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_sqrt_<stype><q>")]
+ )
+
+ ;; Patterns for vector struct loads and stores.
+@@ -4652,7 +4732,7 @@
ld1\\t{%S0.16b - %<Vendreg>0.16b}, %1"
[(set_attr "type" "multiple,neon_store<nregs>_<nregs>reg_q,\
neon_load<nregs>_<nregs>reg_q")
@@ -667,7 +1622,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
(define_insn "aarch64_be_ld1<mode>"
-@@ -4685,7 +4707,7 @@
+@@ -4685,7 +4765,7 @@
stp\\t%q1, %R1, %0
ldp\\t%q0, %R0, %1"
[(set_attr "type" "multiple,neon_stp_q,neon_ldp_q")
@@ -676,7 +1631,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
(define_insn "*aarch64_be_movci"
-@@ -4696,7 +4718,7 @@
+@@ -4696,7 +4776,7 @@
|| register_operand (operands[1], CImode))"
"#"
[(set_attr "type" "multiple")
@@ -685,7 +1640,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
(define_insn "*aarch64_be_movxi"
-@@ -4707,7 +4729,7 @@
+@@ -4707,7 +4787,7 @@
|| register_operand (operands[1], XImode))"
"#"
[(set_attr "type" "multiple")
@@ -694,7 +1649,362 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
(define_split
-@@ -5414,13 +5436,25 @@
+@@ -4787,7 +4867,7 @@
+ DONE;
+ })
+
+-(define_insn "aarch64_ld2<mode>_dreg"
++(define_insn "aarch64_ld2<mode>_dreg_le"
+ [(set (match_operand:OI 0 "register_operand" "=w")
+ (subreg:OI
+ (vec_concat:<VRL2>
+@@ -4800,12 +4880,30 @@
+ (unspec:VD [(match_dup 1)]
+ UNSPEC_LD2)
+ (vec_duplicate:VD (const_int 0)))) 0))]
+- "TARGET_SIMD"
++ "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+ "ld2\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
+ [(set_attr "type" "neon_load2_2reg<q>")]
+ )
+
+-(define_insn "aarch64_ld2<mode>_dreg"
++(define_insn "aarch64_ld2<mode>_dreg_be"
++ [(set (match_operand:OI 0 "register_operand" "=w")
++ (subreg:OI
++ (vec_concat:<VRL2>
++ (vec_concat:<VDBL>
++ (vec_duplicate:VD (const_int 0))
++ (unspec:VD
++ [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
++ UNSPEC_LD2))
++ (vec_concat:<VDBL>
++ (vec_duplicate:VD (const_int 0))
++ (unspec:VD [(match_dup 1)]
++ UNSPEC_LD2))) 0))]
++ "TARGET_SIMD && BYTES_BIG_ENDIAN"
++ "ld2\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
++ [(set_attr "type" "neon_load2_2reg<q>")]
++)
++
++(define_insn "aarch64_ld2<mode>_dreg_le"
+ [(set (match_operand:OI 0 "register_operand" "=w")
+ (subreg:OI
+ (vec_concat:<VRL2>
+@@ -4818,12 +4916,30 @@
+ (unspec:DX [(match_dup 1)]
+ UNSPEC_LD2)
+ (const_int 0))) 0))]
+- "TARGET_SIMD"
++ "TARGET_SIMD && !BYTES_BIG_ENDIAN"
++ "ld1\\t{%S0.1d - %T0.1d}, %1"
++ [(set_attr "type" "neon_load1_2reg<q>")]
++)
++
++(define_insn "aarch64_ld2<mode>_dreg_be"
++ [(set (match_operand:OI 0 "register_operand" "=w")
++ (subreg:OI
++ (vec_concat:<VRL2>
++ (vec_concat:<VDBL>
++ (const_int 0)
++ (unspec:DX
++ [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
++ UNSPEC_LD2))
++ (vec_concat:<VDBL>
++ (const_int 0)
++ (unspec:DX [(match_dup 1)]
++ UNSPEC_LD2))) 0))]
++ "TARGET_SIMD && BYTES_BIG_ENDIAN"
+ "ld1\\t{%S0.1d - %T0.1d}, %1"
+ [(set_attr "type" "neon_load1_2reg<q>")]
+ )
+
+-(define_insn "aarch64_ld3<mode>_dreg"
++(define_insn "aarch64_ld3<mode>_dreg_le"
+ [(set (match_operand:CI 0 "register_operand" "=w")
+ (subreg:CI
+ (vec_concat:<VRL3>
+@@ -4841,12 +4957,35 @@
+ (unspec:VD [(match_dup 1)]
+ UNSPEC_LD3)
+ (vec_duplicate:VD (const_int 0)))) 0))]
+- "TARGET_SIMD"
++ "TARGET_SIMD && !BYTES_BIG_ENDIAN"
++ "ld3\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
++ [(set_attr "type" "neon_load3_3reg<q>")]
++)
++
++(define_insn "aarch64_ld3<mode>_dreg_be"
++ [(set (match_operand:CI 0 "register_operand" "=w")
++ (subreg:CI
++ (vec_concat:<VRL3>
++ (vec_concat:<VRL2>
++ (vec_concat:<VDBL>
++ (vec_duplicate:VD (const_int 0))
++ (unspec:VD
++ [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
++ UNSPEC_LD3))
++ (vec_concat:<VDBL>
++ (vec_duplicate:VD (const_int 0))
++ (unspec:VD [(match_dup 1)]
++ UNSPEC_LD3)))
++ (vec_concat:<VDBL>
++ (vec_duplicate:VD (const_int 0))
++ (unspec:VD [(match_dup 1)]
++ UNSPEC_LD3))) 0))]
++ "TARGET_SIMD && BYTES_BIG_ENDIAN"
+ "ld3\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
+ [(set_attr "type" "neon_load3_3reg<q>")]
+ )
+
+-(define_insn "aarch64_ld3<mode>_dreg"
++(define_insn "aarch64_ld3<mode>_dreg_le"
+ [(set (match_operand:CI 0 "register_operand" "=w")
+ (subreg:CI
+ (vec_concat:<VRL3>
+@@ -4864,12 +5003,35 @@
+ (unspec:DX [(match_dup 1)]
+ UNSPEC_LD3)
+ (const_int 0))) 0))]
+- "TARGET_SIMD"
++ "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+ "ld1\\t{%S0.1d - %U0.1d}, %1"
+ [(set_attr "type" "neon_load1_3reg<q>")]
+ )
+
+-(define_insn "aarch64_ld4<mode>_dreg"
++(define_insn "aarch64_ld3<mode>_dreg_be"
++ [(set (match_operand:CI 0 "register_operand" "=w")
++ (subreg:CI
++ (vec_concat:<VRL3>
++ (vec_concat:<VRL2>
++ (vec_concat:<VDBL>
++ (const_int 0)
++ (unspec:DX
++ [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
++ UNSPEC_LD3))
++ (vec_concat:<VDBL>
++ (const_int 0)
++ (unspec:DX [(match_dup 1)]
++ UNSPEC_LD3)))
++ (vec_concat:<VDBL>
++ (const_int 0)
++ (unspec:DX [(match_dup 1)]
++ UNSPEC_LD3))) 0))]
++ "TARGET_SIMD && BYTES_BIG_ENDIAN"
++ "ld1\\t{%S0.1d - %U0.1d}, %1"
++ [(set_attr "type" "neon_load1_3reg<q>")]
++)
++
++(define_insn "aarch64_ld4<mode>_dreg_le"
+ [(set (match_operand:XI 0 "register_operand" "=w")
+ (subreg:XI
+ (vec_concat:<VRL4>
+@@ -4880,9 +5042,9 @@
+ UNSPEC_LD4)
+ (vec_duplicate:VD (const_int 0)))
+ (vec_concat:<VDBL>
+- (unspec:VD [(match_dup 1)]
++ (unspec:VD [(match_dup 1)]
+ UNSPEC_LD4)
+- (vec_duplicate:VD (const_int 0))))
++ (vec_duplicate:VD (const_int 0))))
+ (vec_concat:<VRL2>
+ (vec_concat:<VDBL>
+ (unspec:VD [(match_dup 1)]
+@@ -4892,12 +5054,40 @@
+ (unspec:VD [(match_dup 1)]
+ UNSPEC_LD4)
+ (vec_duplicate:VD (const_int 0))))) 0))]
+- "TARGET_SIMD"
++ "TARGET_SIMD && !BYTES_BIG_ENDIAN"
++ "ld4\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
++ [(set_attr "type" "neon_load4_4reg<q>")]
++)
++
++(define_insn "aarch64_ld4<mode>_dreg_be"
++ [(set (match_operand:XI 0 "register_operand" "=w")
++ (subreg:XI
++ (vec_concat:<VRL4>
++ (vec_concat:<VRL2>
++ (vec_concat:<VDBL>
++ (vec_duplicate:VD (const_int 0))
++ (unspec:VD
++ [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
++ UNSPEC_LD4))
++ (vec_concat:<VDBL>
++ (vec_duplicate:VD (const_int 0))
++ (unspec:VD [(match_dup 1)]
++ UNSPEC_LD4)))
++ (vec_concat:<VRL2>
++ (vec_concat:<VDBL>
++ (vec_duplicate:VD (const_int 0))
++ (unspec:VD [(match_dup 1)]
++ UNSPEC_LD4))
++ (vec_concat:<VDBL>
++ (vec_duplicate:VD (const_int 0))
++ (unspec:VD [(match_dup 1)]
++ UNSPEC_LD4)))) 0))]
++ "TARGET_SIMD && BYTES_BIG_ENDIAN"
+ "ld4\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
+ [(set_attr "type" "neon_load4_4reg<q>")]
+ )
+
+-(define_insn "aarch64_ld4<mode>_dreg"
++(define_insn "aarch64_ld4<mode>_dreg_le"
+ [(set (match_operand:XI 0 "register_operand" "=w")
+ (subreg:XI
+ (vec_concat:<VRL4>
+@@ -4910,7 +5100,7 @@
+ (vec_concat:<VDBL>
+ (unspec:DX [(match_dup 1)]
+ UNSPEC_LD4)
+- (const_int 0)))
++ (const_int 0)))
+ (vec_concat:<VRL2>
+ (vec_concat:<VDBL>
+ (unspec:DX [(match_dup 1)]
+@@ -4920,7 +5110,35 @@
+ (unspec:DX [(match_dup 1)]
+ UNSPEC_LD4)
+ (const_int 0)))) 0))]
+- "TARGET_SIMD"
++ "TARGET_SIMD && !BYTES_BIG_ENDIAN"
++ "ld1\\t{%S0.1d - %V0.1d}, %1"
++ [(set_attr "type" "neon_load1_4reg<q>")]
++)
++
++(define_insn "aarch64_ld4<mode>_dreg_be"
++ [(set (match_operand:XI 0 "register_operand" "=w")
++ (subreg:XI
++ (vec_concat:<VRL4>
++ (vec_concat:<VRL2>
++ (vec_concat:<VDBL>
++ (const_int 0)
++ (unspec:DX
++ [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
++ UNSPEC_LD4))
++ (vec_concat:<VDBL>
++ (const_int 0)
++ (unspec:DX [(match_dup 1)]
++ UNSPEC_LD4)))
++ (vec_concat:<VRL2>
++ (vec_concat:<VDBL>
++ (const_int 0)
++ (unspec:DX [(match_dup 1)]
++ UNSPEC_LD4))
++ (vec_concat:<VDBL>
++ (const_int 0)
++ (unspec:DX [(match_dup 1)]
++ UNSPEC_LD4)))) 0))]
++ "TARGET_SIMD && BYTES_BIG_ENDIAN"
+ "ld1\\t{%S0.1d - %V0.1d}, %1"
+ [(set_attr "type" "neon_load1_4reg<q>")]
+ )
+@@ -4934,7 +5152,12 @@
+ rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
+ set_mem_size (mem, <VSTRUCT:nregs> * 8);
+
+- emit_insn (gen_aarch64_ld<VSTRUCT:nregs><VDC:mode>_dreg (operands[0], mem));
++ if (BYTES_BIG_ENDIAN)
++ emit_insn (gen_aarch64_ld<VSTRUCT:nregs><VDC:mode>_dreg_be (operands[0],
++ mem));
++ else
++ emit_insn (gen_aarch64_ld<VSTRUCT:nregs><VDC:mode>_dreg_le (operands[0],
++ mem));
+ DONE;
+ })
+
+@@ -5160,10 +5383,10 @@
+ )
+
+ (define_insn "aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>"
+- [(set (match_operand:VALL 0 "register_operand" "=w")
+- (unspec:VALL [(match_operand:VALL 1 "register_operand" "w")
+- (match_operand:VALL 2 "register_operand" "w")]
+- PERMUTE))]
++ [(set (match_operand:VALL_F16 0 "register_operand" "=w")
++ (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")
++ (match_operand:VALL_F16 2 "register_operand" "w")]
++ PERMUTE))]
+ "TARGET_SIMD"
+ "<PERMUTE:perm_insn><PERMUTE:perm_hilo>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+ [(set_attr "type" "neon_permute<q>")]
+@@ -5171,11 +5394,11 @@
+
+ ;; Note immediate (third) operand is lane index not byte index.
+ (define_insn "aarch64_ext<mode>"
+- [(set (match_operand:VALL 0 "register_operand" "=w")
+- (unspec:VALL [(match_operand:VALL 1 "register_operand" "w")
+- (match_operand:VALL 2 "register_operand" "w")
+- (match_operand:SI 3 "immediate_operand" "i")]
+- UNSPEC_EXT))]
++ [(set (match_operand:VALL_F16 0 "register_operand" "=w")
++ (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")
++ (match_operand:VALL_F16 2 "register_operand" "w")
++ (match_operand:SI 3 "immediate_operand" "i")]
++ UNSPEC_EXT))]
+ "TARGET_SIMD"
+ {
+ operands[3] = GEN_INT (INTVAL (operands[3])
+@@ -5186,8 +5409,8 @@
+ )
+
+ (define_insn "aarch64_rev<REVERSE:rev_op><mode>"
+- [(set (match_operand:VALL 0 "register_operand" "=w")
+- (unspec:VALL [(match_operand:VALL 1 "register_operand" "w")]
++ [(set (match_operand:VALL_F16 0 "register_operand" "=w")
++ (unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")]
+ REVERSE))]
+ "TARGET_SIMD"
+ "rev<REVERSE:rev_op>\\t%0.<Vtype>, %1.<Vtype>"
+@@ -5354,31 +5577,32 @@
+ )
+
+ (define_insn "aarch64_frecpe<mode>"
+- [(set (match_operand:VDQF 0 "register_operand" "=w")
+- (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")]
+- UNSPEC_FRECPE))]
++ [(set (match_operand:VHSDF 0 "register_operand" "=w")
++ (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")]
++ UNSPEC_FRECPE))]
+ "TARGET_SIMD"
+ "frecpe\\t%0.<Vtype>, %1.<Vtype>"
+- [(set_attr "type" "neon_fp_recpe_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_recpe_<stype><q>")]
+ )
+
+ (define_insn "aarch64_frecp<FRECP:frecp_suffix><mode>"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
+- FRECP))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
++ FRECP))]
+ "TARGET_SIMD"
+ "frecp<FRECP:frecp_suffix>\\t%<s>0, %<s>1"
+- [(set_attr "type" "neon_fp_recp<FRECP:frecp_suffix>_<GPF:Vetype><GPF:q>")]
++ [(set_attr "type" "neon_fp_recp<FRECP:frecp_suffix>_<GPF_F16:stype>")]
+ )
+
+ (define_insn "aarch64_frecps<mode>"
+- [(set (match_operand:VALLF 0 "register_operand" "=w")
+- (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
+- (match_operand:VALLF 2 "register_operand" "w")]
+- UNSPEC_FRECPS))]
++ [(set (match_operand:VHSDF_HSDF 0 "register_operand" "=w")
++ (unspec:VHSDF_HSDF
++ [(match_operand:VHSDF_HSDF 1 "register_operand" "w")
++ (match_operand:VHSDF_HSDF 2 "register_operand" "w")]
++ UNSPEC_FRECPS))]
+ "TARGET_SIMD"
+ "frecps\\t%<v>0<Vmtype>, %<v>1<Vmtype>, %<v>2<Vmtype>"
+- [(set_attr "type" "neon_fp_recps_<Vetype><q>")]
++ [(set_attr "type" "neon_fp_recps_<stype><q>")]
+ )
+
+ (define_insn "aarch64_urecpe<mode>"
+@@ -5414,13 +5638,25 @@
[(set_attr "type" "crypto_aese")]
)
@@ -730,22 +2040,51 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
;; Generated automatically by gentune.sh from aarch64-cores.def
(define_attr "tune"
- "cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
-+ "cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53"
++ "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
--- a/src/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/src/gcc/config/aarch64/aarch64-tuning-flags.def
-@@ -29,5 +29,3 @@
+@@ -29,5 +29,8 @@
AARCH64_TUNE_ to give an enum name. */
AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
-AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
--
+
++/* Don't create non-8 byte aligned load/store pair. That is if the
++two load/stores are not at least 8 byte aligned don't create load/store
++pairs. */
++AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW)
--- a/src/gcc/config/aarch64/aarch64.c
+++ b/src/gcc/config/aarch64/aarch64.c
-@@ -250,6 +250,22 @@ static const struct cpu_addrcost_table xgene1_addrcost_table =
+@@ -152,7 +152,7 @@ enum aarch64_processor aarch64_tune = cortexa53;
+ unsigned long aarch64_tune_flags = 0;
+
+ /* Global flag for PC relative loads. */
+-bool aarch64_nopcrelative_literal_loads;
++bool aarch64_pcrelative_literal_loads;
+
+ /* Support for command line parsing of boolean flags in the tuning
+ structures. */
+@@ -250,6 +250,38 @@ static const struct cpu_addrcost_table xgene1_addrcost_table =
0, /* imm_offset */
};
++static const struct cpu_addrcost_table qdf24xx_addrcost_table =
++{
++ {
++ 1, /* hi */
++ 0, /* si */
++ 0, /* di */
++ 1, /* ti */
++ },
++ 0, /* pre_modify */
++ 0, /* post_modify */
++ 0, /* register_offset */
++ 0, /* register_sextend */
++ 0, /* register_zextend */
++ 0 /* imm_offset */
++};
++
+static const struct cpu_addrcost_table vulcan_addrcost_table =
+{
+ {
@@ -765,10 +2104,19 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
static const struct cpu_regmove_cost generic_regmove_cost =
{
1, /* GP2GP */
-@@ -308,6 +324,15 @@ static const struct cpu_regmove_cost xgene1_regmove_cost =
+@@ -308,6 +340,24 @@ static const struct cpu_regmove_cost xgene1_regmove_cost =
2 /* FP2FP */
};
++static const struct cpu_regmove_cost qdf24xx_regmove_cost =
++{
++ 2, /* GP2GP */
++ /* Avoid the use of int<->fp moves for spilling. */
++ 6, /* GP2FP */
++ 6, /* FP2GP */
++ 4 /* FP2FP */
++};
++
+static const struct cpu_regmove_cost vulcan_regmove_cost =
+{
+ 1, /* GP2GP */
@@ -781,7 +2129,32 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
/* Generic costs for vector insn classes. */
static const struct cpu_vector_cost generic_vector_cost =
{
-@@ -379,6 +404,24 @@ static const struct cpu_vector_cost xgene1_vector_cost =
+@@ -326,6 +376,24 @@ static const struct cpu_vector_cost generic_vector_cost =
+ 1 /* cond_not_taken_branch_cost */
+ };
+
++/* ThunderX costs for vector insn classes. */
++static const struct cpu_vector_cost thunderx_vector_cost =
++{
++ 1, /* scalar_stmt_cost */
++ 3, /* scalar_load_cost */
++ 1, /* scalar_store_cost */
++ 4, /* vec_stmt_cost */
++ 4, /* vec_permute_cost */
++ 2, /* vec_to_scalar_cost */
++ 2, /* scalar_to_vec_cost */
++ 3, /* vec_align_load_cost */
++ 10, /* vec_unalign_load_cost */
++ 10, /* vec_unalign_store_cost */
++ 1, /* vec_store_cost */
++ 3, /* cond_taken_branch_cost */
++ 3 /* cond_not_taken_branch_cost */
++};
++
+ /* Generic costs for vector insn classes. */
+ static const struct cpu_vector_cost cortexa57_vector_cost =
+ {
+@@ -379,6 +447,24 @@ static const struct cpu_vector_cost xgene1_vector_cost =
1 /* cond_not_taken_branch_cost */
};
@@ -806,7 +2179,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
/* Generic costs for branch instructions. */
static const struct cpu_branch_cost generic_branch_cost =
{
-@@ -393,6 +436,37 @@ static const struct cpu_branch_cost cortexa57_branch_cost =
+@@ -393,6 +479,37 @@ static const struct cpu_branch_cost cortexa57_branch_cost =
3 /* Unpredictable. */
};
@@ -844,7 +2217,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
static const struct tune_params generic_tunings =
{
&cortexa57_extra_costs,
-@@ -400,6 +474,7 @@ static const struct tune_params generic_tunings =
+@@ -400,6 +517,7 @@ static const struct tune_params generic_tunings =
&generic_regmove_cost,
&generic_vector_cost,
&generic_branch_cost,
@@ -852,23 +2225,46 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
4, /* memmov_cost */
2, /* issue_rate */
AARCH64_FUSE_NOTHING, /* fusible_ops */
-@@ -424,6 +499,7 @@ static const struct tune_params cortexa35_tunings =
+@@ -423,14 +541,15 @@ static const struct tune_params cortexa35_tunings =
+ &generic_addrcost_table,
&cortexa53_regmove_cost,
&generic_vector_cost,
- &generic_branch_cost,
+- &generic_branch_cost,
++ &cortexa57_branch_cost,
+ &generic_approx_modes,
4, /* memmov_cost */
1, /* issue_rate */
- (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-@@ -449,6 +525,7 @@ static const struct tune_params cortexa53_tunings =
+- (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
++ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+ | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */
+- 8, /* function_align. */
++ 16, /* function_align. */
+ 8, /* jump_align. */
+- 4, /* loop_align. */
++ 8, /* loop_align. */
+ 2, /* int_reassoc_width. */
+ 4, /* fp_reassoc_width. */
+ 1, /* vec_reassoc_width. */
+@@ -448,14 +567,15 @@ static const struct tune_params cortexa53_tunings =
+ &generic_addrcost_table,
&cortexa53_regmove_cost,
&generic_vector_cost,
- &generic_branch_cost,
+- &generic_branch_cost,
++ &cortexa57_branch_cost,
+ &generic_approx_modes,
4, /* memmov_cost */
2, /* issue_rate */
(AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-@@ -474,6 +551,7 @@ static const struct tune_params cortexa57_tunings =
+ | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */
+- 8, /* function_align. */
++ 16, /* function_align. */
+ 8, /* jump_align. */
+- 4, /* loop_align. */
++ 8, /* loop_align. */
+ 2, /* int_reassoc_width. */
+ 4, /* fp_reassoc_width. */
+ 1, /* vec_reassoc_width. */
+@@ -474,13 +594,14 @@ static const struct tune_params cortexa57_tunings =
&cortexa57_regmove_cost,
&cortexa57_vector_cost,
&cortexa57_branch_cost,
@@ -876,15 +2272,68 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
4, /* memmov_cost */
3, /* issue_rate */
(AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-@@ -499,6 +577,7 @@ static const struct tune_params cortexa72_tunings =
+ | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */
+ 16, /* function_align. */
+ 8, /* jump_align. */
+- 4, /* loop_align. */
++ 8, /* loop_align. */
+ 2, /* int_reassoc_width. */
+ 4, /* fp_reassoc_width. */
+ 1, /* vec_reassoc_width. */
+@@ -498,14 +619,15 @@ static const struct tune_params cortexa72_tunings =
+ &cortexa57_addrcost_table,
&cortexa57_regmove_cost,
&cortexa57_vector_cost,
- &generic_branch_cost,
+- &generic_branch_cost,
++ &cortexa57_branch_cost,
+ &generic_approx_modes,
4, /* memmov_cost */
3, /* issue_rate */
(AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
-@@ -524,6 +603,7 @@ static const struct tune_params exynosm1_tunings =
+ | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */
+ 16, /* function_align. */
+ 8, /* jump_align. */
+- 4, /* loop_align. */
++ 8, /* loop_align. */
+ 2, /* int_reassoc_width. */
+ 4, /* fp_reassoc_width. */
+ 1, /* vec_reassoc_width. */
+@@ -513,7 +635,33 @@ static const struct tune_params cortexa72_tunings =
+ 2, /* min_div_recip_mul_df. */
+ 0, /* max_case_values. */
+ 0, /* cache_line_size. */
+- tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */
++ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
++ (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */
++};
++
++static const struct tune_params cortexa73_tunings =
++{
++ &cortexa57_extra_costs,
++ &cortexa57_addrcost_table,
++ &cortexa57_regmove_cost,
++ &cortexa57_vector_cost,
++ &cortexa57_branch_cost,
++ &generic_approx_modes,
++ 4, /* memmov_cost. */
++ 2, /* issue_rate. */
++ (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
++ | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */
++ 16, /* function_align. */
++ 8, /* jump_align. */
++ 8, /* loop_align. */
++ 2, /* int_reassoc_width. */
++ 4, /* fp_reassoc_width. */
++ 1, /* vec_reassoc_width. */
++ 2, /* min_div_recip_mul_sf. */
++ 2, /* min_div_recip_mul_df. */
++ 0, /* max_case_values. */
++ 0, /* cache_line_size. */
++ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
+ (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */
+ };
+
+@@ -524,6 +672,7 @@ static const struct tune_params exynosm1_tunings =
&exynosm1_regmove_cost,
&exynosm1_vector_cost,
&generic_branch_cost,
@@ -892,7 +2341,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
4, /* memmov_cost */
3, /* issue_rate */
(AARCH64_FUSE_AES_AESMC), /* fusible_ops */
-@@ -538,7 +618,7 @@ static const struct tune_params exynosm1_tunings =
+@@ -538,7 +687,7 @@ static const struct tune_params exynosm1_tunings =
48, /* max_case_values. */
64, /* cache_line_size. */
tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */
@@ -901,15 +2350,27 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
};
static const struct tune_params thunderx_tunings =
-@@ -548,6 +628,7 @@ static const struct tune_params thunderx_tunings =
+@@ -546,8 +695,9 @@ static const struct tune_params thunderx_tunings =
+ &thunderx_extra_costs,
+ &generic_addrcost_table,
&thunderx_regmove_cost,
- &generic_vector_cost,
+- &generic_vector_cost,
++ &thunderx_vector_cost,
&generic_branch_cost,
+ &generic_approx_modes,
6, /* memmov_cost */
2, /* issue_rate */
AARCH64_FUSE_CMP_BRANCH, /* fusible_ops */
-@@ -572,6 +653,7 @@ static const struct tune_params xgene1_tunings =
+@@ -562,7 +712,7 @@ static const struct tune_params thunderx_tunings =
+ 0, /* max_case_values. */
+ 0, /* cache_line_size. */
+ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */
+- (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */
++ (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) /* tune_flags. */
+ };
+
+ static const struct tune_params xgene1_tunings =
+@@ -572,6 +722,7 @@ static const struct tune_params xgene1_tunings =
&xgene1_regmove_cost,
&xgene1_vector_cost,
&generic_branch_cost,
@@ -917,7 +2378,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
6, /* memmov_cost */
4, /* issue_rate */
AARCH64_FUSE_NOTHING, /* fusible_ops */
-@@ -586,7 +668,32 @@ static const struct tune_params xgene1_tunings =
+@@ -586,7 +737,58 @@ static const struct tune_params xgene1_tunings =
0, /* max_case_values. */
0, /* cache_line_size. */
tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */
@@ -925,6 +2386,32 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
+ (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */
+};
+
++static const struct tune_params qdf24xx_tunings =
++{
++ &qdf24xx_extra_costs,
++ &qdf24xx_addrcost_table,
++ &qdf24xx_regmove_cost,
++ &generic_vector_cost,
++ &generic_branch_cost,
++ &generic_approx_modes,
++ 4, /* memmov_cost */
++ 4, /* issue_rate */
++ (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
++ | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */
++ 16, /* function_align. */
++ 8, /* jump_align. */
++ 16, /* loop_align. */
++ 2, /* int_reassoc_width. */
++ 4, /* fp_reassoc_width. */
++ 1, /* vec_reassoc_width. */
++ 2, /* min_div_recip_mul_sf. */
++ 2, /* min_div_recip_mul_df. */
++ 0, /* max_case_values. */
++ 64, /* cache_line_size. */
++ tune_params::AUTOPREFETCHER_STRONG, /* autoprefetcher_model. */
++ (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */
++};
++
+static const struct tune_params vulcan_tunings =
+{
+ &vulcan_extra_costs,
@@ -951,505 +2438,1633 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
};
/* Support for fine-grained override of the tuning structures. */
-@@ -3582,7 +3689,12 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
- return aarch64_tls_referenced_p (x);
+@@ -663,16 +865,6 @@ struct aarch64_option_extension
+ const unsigned long flags_off;
+ };
+
+-/* ISA extensions in AArch64. */
+-static const struct aarch64_option_extension all_extensions[] =
+-{
+-#define AARCH64_OPT_EXTENSION(NAME, X, FLAGS_ON, FLAGS_OFF, FEATURE_STRING) \
+- {NAME, FLAGS_ON, FLAGS_OFF},
+-#include "aarch64-option-extensions.def"
+-#undef AARCH64_OPT_EXTENSION
+- {NULL, 0, 0}
+-};
+-
+ typedef enum aarch64_cond_code
+ {
+ AARCH64_EQ = 0, AARCH64_NE, AARCH64_CS, AARCH64_CC, AARCH64_MI, AARCH64_PL,
+@@ -1703,7 +1895,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
+ we need to expand the literal pool access carefully.
+ This is something that needs to be done in a number
+ of places, so could well live as a separate function. */
+- if (aarch64_nopcrelative_literal_loads)
++ if (!aarch64_pcrelative_literal_loads)
+ {
+ gcc_assert (can_create_pseudo_p ());
+ base = gen_reg_rtx (ptr_mode);
+@@ -1766,6 +1958,61 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
+ aarch64_internal_mov_immediate (dest, imm, true, GET_MODE (dest));
}
--/* Implement TARGET_CASE_VALUES_THRESHOLD. */
-+/* Implement TARGET_CASE_VALUES_THRESHOLD.
-+ The expansion for a table switch is quite expensive due to the number
-+ of instructions, the table lookup and hard to predict indirect jump.
-+ When optimizing for speed, and -O3 enabled, use the per-core tuning if
-+ set, otherwise use tables for > 16 cases as a tradeoff between size and
-+ performance. When optimizing for size, use the default setting. */
++/* Add DELTA to REGNUM in mode MODE. SCRATCHREG can be used to held
++ intermediate value if necessary.
++
++ This function is sometimes used to adjust the stack pointer, so we must
++ ensure that it can never cause transient stack deallocation by writing an
++ invalid value into REGNUM. */
++
++static void
++aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
++ HOST_WIDE_INT delta, bool frame_related_p)
++{
++ HOST_WIDE_INT mdelta = abs_hwi (delta);
++ rtx this_rtx = gen_rtx_REG (mode, regnum);
++ rtx_insn *insn;
++
++ /* Do nothing if mdelta is zero. */
++ if (!mdelta)
++ return;
++
++ /* We only need single instruction if the offset fit into add/sub. */
++ if (aarch64_uimm12_shift (mdelta))
++ {
++ insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
++ RTX_FRAME_RELATED_P (insn) = frame_related_p;
++ return;
++ }
++
++ /* We need two add/sub instructions, each one performing part of the
++ calculation. Don't do this if the addend can be loaded into register with
++ a single instruction, in that case we prefer a move to a scratch register
++ following by an addition. */
++ if (mdelta < 0x1000000 && !aarch64_move_imm (delta, mode))
++ {
++ HOST_WIDE_INT low_off = mdelta & 0xfff;
++
++ low_off = delta < 0 ? -low_off : low_off;
++ insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
++ RTX_FRAME_RELATED_P (insn) = frame_related_p;
++ insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
++ RTX_FRAME_RELATED_P (insn) = frame_related_p;
++ return;
++ }
++
++ /* Otherwise use generic function to handle all other situations. */
++ rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
++ aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
++ insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
++ if (frame_related_p)
++ {
++ RTX_FRAME_RELATED_P (insn) = frame_related_p;
++ rtx adj = plus_constant (mode, this_rtx, delta);
++ add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
++ }
++}
++
+ static bool
+ aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
+ tree exp ATTRIBUTE_UNUSED)
+@@ -2498,8 +2745,8 @@ aarch64_layout_frame (void)
+ #define SLOT_NOT_REQUIRED (-2)
+ #define SLOT_REQUIRED (-1)
- static unsigned int
- aarch64_case_values_threshold (void)
-@@ -3593,7 +3705,7 @@ aarch64_case_values_threshold (void)
- && selected_cpu->tune->max_case_values != 0)
- return selected_cpu->tune->max_case_values;
- else
-- return default_case_values_threshold ();
-+ return optimize_size ? default_case_values_threshold () : 17;
+- cfun->machine->frame.wb_candidate1 = FIRST_PSEUDO_REGISTER;
+- cfun->machine->frame.wb_candidate2 = FIRST_PSEUDO_REGISTER;
++ cfun->machine->frame.wb_candidate1 = INVALID_REGNUM;
++ cfun->machine->frame.wb_candidate2 = INVALID_REGNUM;
+
+ /* First mark all the registers that really need to be saved... */
+ for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
+@@ -2533,7 +2780,6 @@ aarch64_layout_frame (void)
+ cfun->machine->frame.wb_candidate1 = R29_REGNUM;
+ cfun->machine->frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
+ cfun->machine->frame.wb_candidate2 = R30_REGNUM;
+- cfun->machine->frame.hardfp_offset = 2 * UNITS_PER_WORD;
+ offset += 2 * UNITS_PER_WORD;
+ }
+
+@@ -2542,9 +2788,9 @@ aarch64_layout_frame (void)
+ if (cfun->machine->frame.reg_offset[regno] == SLOT_REQUIRED)
+ {
+ cfun->machine->frame.reg_offset[regno] = offset;
+- if (cfun->machine->frame.wb_candidate1 == FIRST_PSEUDO_REGISTER)
++ if (cfun->machine->frame.wb_candidate1 == INVALID_REGNUM)
+ cfun->machine->frame.wb_candidate1 = regno;
+- else if (cfun->machine->frame.wb_candidate2 == FIRST_PSEUDO_REGISTER)
++ else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM)
+ cfun->machine->frame.wb_candidate2 = regno;
+ offset += UNITS_PER_WORD;
+ }
+@@ -2553,24 +2799,23 @@ aarch64_layout_frame (void)
+ if (cfun->machine->frame.reg_offset[regno] == SLOT_REQUIRED)
+ {
+ cfun->machine->frame.reg_offset[regno] = offset;
+- if (cfun->machine->frame.wb_candidate1 == FIRST_PSEUDO_REGISTER)
++ if (cfun->machine->frame.wb_candidate1 == INVALID_REGNUM)
+ cfun->machine->frame.wb_candidate1 = regno;
+- else if (cfun->machine->frame.wb_candidate2 == FIRST_PSEUDO_REGISTER
++ else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM
+ && cfun->machine->frame.wb_candidate1 >= V0_REGNUM)
+ cfun->machine->frame.wb_candidate2 = regno;
+ offset += UNITS_PER_WORD;
+ }
+
+- cfun->machine->frame.padding0 =
+- (ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT) - offset);
+ offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+
+ cfun->machine->frame.saved_regs_size = offset;
+
++ HOST_WIDE_INT varargs_and_saved_regs_size
++ = offset + cfun->machine->frame.saved_varargs_size;
++
+ cfun->machine->frame.hard_fp_offset
+- = ROUND_UP (cfun->machine->frame.saved_varargs_size
+- + get_frame_size ()
+- + cfun->machine->frame.saved_regs_size,
++ = ROUND_UP (varargs_and_saved_regs_size + get_frame_size (),
+ STACK_BOUNDARY / BITS_PER_UNIT);
+
+ cfun->machine->frame.frame_size
+@@ -2578,6 +2823,77 @@ aarch64_layout_frame (void)
+ + crtl->outgoing_args_size,
+ STACK_BOUNDARY / BITS_PER_UNIT);
+
++ cfun->machine->frame.locals_offset = cfun->machine->frame.saved_varargs_size;
++
++ cfun->machine->frame.initial_adjust = 0;
++ cfun->machine->frame.final_adjust = 0;
++ cfun->machine->frame.callee_adjust = 0;
++ cfun->machine->frame.callee_offset = 0;
++
++ HOST_WIDE_INT max_push_offset = 0;
++ if (cfun->machine->frame.wb_candidate2 != INVALID_REGNUM)
++ max_push_offset = 512;
++ else if (cfun->machine->frame.wb_candidate1 != INVALID_REGNUM)
++ max_push_offset = 256;
++
++ if (cfun->machine->frame.frame_size < max_push_offset
++ && crtl->outgoing_args_size == 0)
++ {
++ /* Simple, small frame with no outgoing arguments:
++ stp reg1, reg2, [sp, -frame_size]!
++ stp reg3, reg4, [sp, 16] */
++ cfun->machine->frame.callee_adjust = cfun->machine->frame.frame_size;
++ }
++ else if ((crtl->outgoing_args_size
++ + cfun->machine->frame.saved_regs_size < 512)
++ && !(cfun->calls_alloca
++ && cfun->machine->frame.hard_fp_offset < max_push_offset))
++ {
++ /* Frame with small outgoing arguments:
++ sub sp, sp, frame_size
++ stp reg1, reg2, [sp, outgoing_args_size]
++ stp reg3, reg4, [sp, outgoing_args_size + 16] */
++ cfun->machine->frame.initial_adjust = cfun->machine->frame.frame_size;
++ cfun->machine->frame.callee_offset
++ = cfun->machine->frame.frame_size - cfun->machine->frame.hard_fp_offset;
++ }
++ else if (cfun->machine->frame.hard_fp_offset < max_push_offset)
++ {
++ /* Frame with large outgoing arguments but a small local area:
++ stp reg1, reg2, [sp, -hard_fp_offset]!
++ stp reg3, reg4, [sp, 16]
++ sub sp, sp, outgoing_args_size */
++ cfun->machine->frame.callee_adjust = cfun->machine->frame.hard_fp_offset;
++ cfun->machine->frame.final_adjust
++ = cfun->machine->frame.frame_size - cfun->machine->frame.callee_adjust;
++ }
++ else if (!frame_pointer_needed
++ && varargs_and_saved_regs_size < max_push_offset)
++ {
++ /* Frame with large local area and outgoing arguments (this pushes the
++ callee-saves first, followed by the locals and outgoing area):
++ stp reg1, reg2, [sp, -varargs_and_saved_regs_size]!
++ stp reg3, reg4, [sp, 16]
++ sub sp, sp, frame_size - varargs_and_saved_regs_size */
++ cfun->machine->frame.callee_adjust = varargs_and_saved_regs_size;
++ cfun->machine->frame.final_adjust
++ = cfun->machine->frame.frame_size - cfun->machine->frame.callee_adjust;
++ cfun->machine->frame.hard_fp_offset = cfun->machine->frame.callee_adjust;
++ cfun->machine->frame.locals_offset = cfun->machine->frame.hard_fp_offset;
++ }
++ else
++ {
++ /* Frame with large local area and outgoing arguments using frame pointer:
++ sub sp, sp, hard_fp_offset
++ stp x29, x30, [sp, 0]
++ add x29, sp, 0
++ stp reg3, reg4, [sp, 16]
++ sub sp, sp, outgoing_args_size */
++ cfun->machine->frame.initial_adjust = cfun->machine->frame.hard_fp_offset;
++ cfun->machine->frame.final_adjust
++ = cfun->machine->frame.frame_size - cfun->machine->frame.initial_adjust;
++ }
++
+ cfun->machine->frame.laid_out = true;
}
- /* Return true if register REGNO is a valid index register.
-@@ -4232,14 +4344,6 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
- && GET_CODE (x) == NEG)
- return CC_Zmode;
+@@ -2631,10 +2947,14 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
+ }
-- /* A compare of a mode narrower than SI mode against zero can be done
-- by extending the value in the comparison. */
-- if ((GET_MODE (x) == QImode || GET_MODE (x) == HImode)
-- && y == const0_rtx)
-- /* Only use sign-extension if we really need it. */
-- return ((code == GT || code == GE || code == LE || code == LT)
-- ? CC_SESWPmode : CC_ZESWPmode);
--
- /* A test for unsigned overflow. */
- if ((GET_MODE (x) == DImode || GET_MODE (x) == TImode)
- && code == NE
-@@ -4308,8 +4412,6 @@ aarch64_get_condition_code_1 (enum machine_mode mode, enum rtx_code comp_code)
- break;
+ static void
+-aarch64_pushwb_pair_reg (machine_mode mode, unsigned regno1,
+- unsigned regno2, HOST_WIDE_INT adjustment)
++aarch64_push_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment)
+ {
+ rtx_insn *insn;
++ machine_mode mode = (regno1 <= R30_REGNUM) ? DImode : DFmode;
++
++ if (regno2 == INVALID_REGNUM)
++ return aarch64_pushwb_single_reg (mode, regno1, adjustment);
++
+ rtx reg1 = gen_rtx_REG (mode, regno1);
+ rtx reg2 = gen_rtx_REG (mode, regno2);
- case CC_SWPmode:
-- case CC_ZESWPmode:
-- case CC_SESWPmode:
- switch (comp_code)
- {
- case NE: return AARCH64_NE;
-@@ -5022,120 +5124,6 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode)
- return x;
+@@ -2662,6 +2982,30 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
+ }
}
--/* Try a machine-dependent way of reloading an illegitimate address
-- operand. If we find one, push the reload and return the new rtx. */
++static void
++aarch64_pop_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment,
++ rtx *cfi_ops)
++{
++ machine_mode mode = (regno1 <= R30_REGNUM) ? DImode : DFmode;
++ rtx reg1 = gen_rtx_REG (mode, regno1);
++
++ *cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg1, *cfi_ops);
++
++ if (regno2 == INVALID_REGNUM)
++ {
++ rtx mem = plus_constant (Pmode, stack_pointer_rtx, adjustment);
++ mem = gen_rtx_POST_MODIFY (Pmode, stack_pointer_rtx, mem);
++ emit_move_insn (reg1, gen_rtx_MEM (mode, mem));
++ }
++ else
++ {
++ rtx reg2 = gen_rtx_REG (mode, regno2);
++ *cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg2, *cfi_ops);
++ emit_insn (aarch64_gen_loadwb_pair (mode, stack_pointer_rtx, reg1,
++ reg2, adjustment));
++ }
++}
++
+ static rtx
+ aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2,
+ rtx reg2)
+@@ -2848,23 +3192,16 @@ aarch64_restore_callee_saves (machine_mode mode,
+ void
+ aarch64_expand_prologue (void)
+ {
+- /* sub sp, sp, #<frame_size>
+- stp {fp, lr}, [sp, #<frame_size> - 16]
+- add fp, sp, #<frame_size> - hardfp_offset
+- stp {cs_reg}, [fp, #-16] etc.
-
--rtx
--aarch64_legitimize_reload_address (rtx *x_p,
-- machine_mode mode,
-- int opnum, int type,
-- int ind_levels ATTRIBUTE_UNUSED)
--{
-- rtx x = *x_p;
+- sub sp, sp, <final_adjustment_if_any>
+- */
+- HOST_WIDE_INT frame_size, offset;
+- HOST_WIDE_INT fp_offset; /* Offset from hard FP to SP. */
+- HOST_WIDE_INT hard_fp_offset;
+- rtx_insn *insn;
-
-- /* Do not allow mem (plus (reg, const)) if vector struct mode. */
-- if (aarch64_vect_struct_mode_p (mode)
-- && GET_CODE (x) == PLUS
-- && REG_P (XEXP (x, 0))
-- && CONST_INT_P (XEXP (x, 1)))
+ aarch64_layout_frame ();
+
+- offset = frame_size = cfun->machine->frame.frame_size;
+- hard_fp_offset = cfun->machine->frame.hard_fp_offset;
+- fp_offset = frame_size - hard_fp_offset;
++ HOST_WIDE_INT frame_size = cfun->machine->frame.frame_size;
++ HOST_WIDE_INT initial_adjust = cfun->machine->frame.initial_adjust;
++ HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
++ HOST_WIDE_INT final_adjust = cfun->machine->frame.final_adjust;
++ HOST_WIDE_INT callee_offset = cfun->machine->frame.callee_offset;
++ unsigned reg1 = cfun->machine->frame.wb_candidate1;
++ unsigned reg2 = cfun->machine->frame.wb_candidate2;
++ rtx_insn *insn;
+
+ if (flag_stack_usage_info)
+ current_function_static_stack_size = frame_size;
+@@ -2881,129 +3218,29 @@ aarch64_expand_prologue (void)
+ aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT, frame_size);
+ }
+
+- /* Store pairs and load pairs have a range only -512 to 504. */
+- if (offset >= 512)
- {
-- rtx orig_rtx = x;
-- x = copy_rtx (x);
-- push_reload (orig_rtx, NULL_RTX, x_p, NULL,
-- BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
-- opnum, (enum reload_type) type);
-- return x;
-- }
+- /* When the frame has a large size, an initial decrease is done on
+- the stack pointer to jump over the callee-allocated save area for
+- register varargs, the local variable area and/or the callee-saved
+- register area. This will allow the pre-index write-back
+- store pair instructions to be used for setting up the stack frame
+- efficiently. */
+- offset = hard_fp_offset;
+- if (offset >= 512)
+- offset = cfun->machine->frame.saved_regs_size;
-
-- /* We must recognize output that we have already generated ourselves. */
-- if (GET_CODE (x) == PLUS
-- && GET_CODE (XEXP (x, 0)) == PLUS
-- && REG_P (XEXP (XEXP (x, 0), 0))
-- && CONST_INT_P (XEXP (XEXP (x, 0), 1))
-- && CONST_INT_P (XEXP (x, 1)))
-- {
-- push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
-- BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
-- opnum, (enum reload_type) type);
-- return x;
+- frame_size -= (offset + crtl->outgoing_args_size);
+- fp_offset = 0;
++ aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, -initial_adjust, true);
+
+- if (frame_size >= 0x1000000)
+- {
+- rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM);
+- emit_move_insn (op0, GEN_INT (-frame_size));
+- insn = emit_insn (gen_add2_insn (stack_pointer_rtx, op0));
+-
+- add_reg_note (insn, REG_CFA_ADJUST_CFA,
+- gen_rtx_SET (stack_pointer_rtx,
+- plus_constant (Pmode, stack_pointer_rtx,
+- -frame_size)));
+- RTX_FRAME_RELATED_P (insn) = 1;
+- }
+- else if (frame_size > 0)
+- {
+- int hi_ofs = frame_size & 0xfff000;
+- int lo_ofs = frame_size & 0x000fff;
++ if (callee_adjust != 0)
++ aarch64_push_regs (reg1, reg2, callee_adjust);
+
+- if (hi_ofs)
+- {
+- insn = emit_insn (gen_add2_insn
+- (stack_pointer_rtx, GEN_INT (-hi_ofs)));
+- RTX_FRAME_RELATED_P (insn) = 1;
+- }
+- if (lo_ofs)
+- {
+- insn = emit_insn (gen_add2_insn
+- (stack_pointer_rtx, GEN_INT (-lo_ofs)));
+- RTX_FRAME_RELATED_P (insn) = 1;
+- }
+- }
- }
+- else
+- frame_size = -1;
-
-- /* We wish to handle large displacements off a base register by splitting
-- the addend across an add and the mem insn. This can cut the number of
-- extra insns needed from 3 to 1. It is only useful for load/store of a
-- single register with 12 bit offset field. */
-- if (GET_CODE (x) == PLUS
-- && REG_P (XEXP (x, 0))
-- && CONST_INT_P (XEXP (x, 1))
-- && HARD_REGISTER_P (XEXP (x, 0))
-- && mode != TImode
-- && mode != TFmode
-- && aarch64_regno_ok_for_base_p (REGNO (XEXP (x, 0)), true))
-- {
-- HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
-- HOST_WIDE_INT low = val & 0xfff;
-- HOST_WIDE_INT high = val - low;
-- HOST_WIDE_INT offs;
-- rtx cst;
-- machine_mode xmode = GET_MODE (x);
+- if (offset > 0)
++ if (frame_pointer_needed)
+ {
+- bool skip_wb = false;
-
-- /* In ILP32, xmode can be either DImode or SImode. */
-- gcc_assert (xmode == DImode || xmode == SImode);
+- if (frame_pointer_needed)
+- {
+- skip_wb = true;
-
-- /* Reload non-zero BLKmode offsets. This is because we cannot ascertain
-- BLKmode alignment. */
-- if (GET_MODE_SIZE (mode) == 0)
-- return NULL_RTX;
+- if (fp_offset)
+- {
+- insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
+- GEN_INT (-offset)));
+- RTX_FRAME_RELATED_P (insn) = 1;
-
-- offs = low % GET_MODE_SIZE (mode);
+- aarch64_save_callee_saves (DImode, fp_offset, R29_REGNUM,
+- R30_REGNUM, false);
+- }
+- else
+- aarch64_pushwb_pair_reg (DImode, R29_REGNUM, R30_REGNUM, offset);
-
-- /* Align misaligned offset by adjusting high part to compensate. */
-- if (offs != 0)
+- /* Set up frame pointer to point to the location of the
+- previous frame pointer on the stack. */
+- insn = emit_insn (gen_add3_insn (hard_frame_pointer_rtx,
+- stack_pointer_rtx,
+- GEN_INT (fp_offset)));
+- RTX_FRAME_RELATED_P (insn) = 1;
+- emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
+- }
+- else
- {
-- if (aarch64_uimm12_shift (high + offs))
+- unsigned reg1 = cfun->machine->frame.wb_candidate1;
+- unsigned reg2 = cfun->machine->frame.wb_candidate2;
+-
+- if (fp_offset
+- || reg1 == FIRST_PSEUDO_REGISTER
+- || (reg2 == FIRST_PSEUDO_REGISTER
+- && offset >= 256))
- {
-- /* Align down. */
-- low = low - offs;
-- high = high + offs;
+- insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
+- GEN_INT (-offset)));
+- RTX_FRAME_RELATED_P (insn) = 1;
- }
- else
- {
-- /* Align up. */
-- offs = GET_MODE_SIZE (mode) - offs;
-- low = low + offs;
-- high = high + (low & 0x1000) - offs;
-- low &= 0xfff;
+- machine_mode mode1 = (reg1 <= R30_REGNUM) ? DImode : DFmode;
+-
+- skip_wb = true;
+-
+- if (reg2 == FIRST_PSEUDO_REGISTER)
+- aarch64_pushwb_single_reg (mode1, reg1, offset);
+- else
+- aarch64_pushwb_pair_reg (mode1, reg1, reg2, offset);
- }
- }
-
-- /* Check for overflow. */
-- if (high + low != val)
-- return NULL_RTX;
+- aarch64_save_callee_saves (DImode, fp_offset, R0_REGNUM, R30_REGNUM,
+- skip_wb);
+- aarch64_save_callee_saves (DFmode, fp_offset, V0_REGNUM, V31_REGNUM,
+- skip_wb);
++ if (callee_adjust == 0)
++ aarch64_save_callee_saves (DImode, callee_offset, R29_REGNUM,
++ R30_REGNUM, false);
++ insn = emit_insn (gen_add3_insn (hard_frame_pointer_rtx,
++ stack_pointer_rtx,
++ GEN_INT (callee_offset)));
++ RTX_FRAME_RELATED_P (insn) = 1;
++ emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
+ }
+
+- /* when offset >= 512,
+- sub sp, sp, #<outgoing_args_size> */
+- if (frame_size > -1)
+- {
+- if (crtl->outgoing_args_size > 0)
+- {
+- insn = emit_insn (gen_add2_insn
+- (stack_pointer_rtx,
+- GEN_INT (- crtl->outgoing_args_size)));
+- RTX_FRAME_RELATED_P (insn) = 1;
+- }
+- }
++ aarch64_save_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
++ callee_adjust != 0 || frame_pointer_needed);
++ aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
++ callee_adjust != 0 || frame_pointer_needed);
++ aarch64_add_constant (Pmode, SP_REGNUM, IP1_REGNUM, -final_adjust,
++ !frame_pointer_needed);
+ }
+
+ /* Return TRUE if we can use a simple_return insn.
+@@ -3026,150 +3263,79 @@ aarch64_use_return_insn_p (void)
+ return cfun->machine->frame.frame_size == 0;
+ }
+
+-/* Generate the epilogue instructions for returning from a function. */
++/* Generate the epilogue instructions for returning from a function.
++ This is almost exactly the reverse of the prolog sequence, except
++ that we need to insert barriers to avoid scheduling loads that read
++ from a deallocated stack, and we optimize the unwind records by
++ emitting them all together if possible. */
+ void
+ aarch64_expand_epilogue (bool for_sibcall)
+ {
+- HOST_WIDE_INT frame_size, offset;
+- HOST_WIDE_INT fp_offset;
+- HOST_WIDE_INT hard_fp_offset;
+- rtx_insn *insn;
+- /* We need to add memory barrier to prevent read from deallocated stack. */
+- bool need_barrier_p = (get_frame_size () != 0
+- || cfun->machine->frame.saved_varargs_size);
-
-- cst = GEN_INT (high);
-- if (!aarch64_uimm12_shift (high))
-- cst = force_const_mem (xmode, cst);
+ aarch64_layout_frame ();
+
+- offset = frame_size = cfun->machine->frame.frame_size;
+- hard_fp_offset = cfun->machine->frame.hard_fp_offset;
+- fp_offset = frame_size - hard_fp_offset;
++ HOST_WIDE_INT initial_adjust = cfun->machine->frame.initial_adjust;
++ HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
++ HOST_WIDE_INT final_adjust = cfun->machine->frame.final_adjust;
++ HOST_WIDE_INT callee_offset = cfun->machine->frame.callee_offset;
++ unsigned reg1 = cfun->machine->frame.wb_candidate1;
++ unsigned reg2 = cfun->machine->frame.wb_candidate2;
++ rtx cfi_ops = NULL;
++ rtx_insn *insn;
+
+- /* Store pairs and load pairs have a range only -512 to 504. */
+- if (offset >= 512)
+- {
+- offset = hard_fp_offset;
+- if (offset >= 512)
+- offset = cfun->machine->frame.saved_regs_size;
++ /* We need to add memory barrier to prevent read from deallocated stack. */
++ bool need_barrier_p = (get_frame_size ()
++ + cfun->machine->frame.saved_varargs_size) != 0;
+
+- frame_size -= (offset + crtl->outgoing_args_size);
+- fp_offset = 0;
+- if (!frame_pointer_needed && crtl->outgoing_args_size > 0)
+- {
+- insn = emit_insn (gen_add2_insn
+- (stack_pointer_rtx,
+- GEN_INT (crtl->outgoing_args_size)));
+- RTX_FRAME_RELATED_P (insn) = 1;
+- }
++ /* Emit a barrier to prevent loads from a deallocated stack. */
++ if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca)
++ {
++ emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
++ need_barrier_p = false;
+ }
+- else
+- frame_size = -1;
+
+- /* If there were outgoing arguments or we've done dynamic stack
+- allocation, then restore the stack pointer from the frame
+- pointer. This is at most one insn and more efficient than using
+- GCC's internal mechanism. */
+- if (frame_pointer_needed
+- && (crtl->outgoing_args_size || cfun->calls_alloca))
++ /* Restore the stack pointer from the frame pointer if it may not
++ be the same as the stack pointer. */
++ if (frame_pointer_needed && (final_adjust || cfun->calls_alloca))
+ {
+- if (cfun->calls_alloca)
+- emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
-
-- /* Reload high part into base reg, leaving the low part
-- in the mem instruction.
-- Note that replacing this gen_rtx_PLUS with plus_constant is
-- wrong in this case because we rely on the
-- (plus (plus reg c1) c2) structure being preserved so that
-- XEXP (*p, 0) in push_reload below uses the correct term. */
-- x = gen_rtx_PLUS (xmode,
-- gen_rtx_PLUS (xmode, XEXP (x, 0), cst),
-- GEN_INT (low));
+ insn = emit_insn (gen_add3_insn (stack_pointer_rtx,
+ hard_frame_pointer_rtx,
+- GEN_INT (0)));
+- offset = offset - fp_offset;
++ GEN_INT (-callee_offset)));
++ /* If writeback is used when restoring callee-saves, the CFA
++ is restored on the instruction doing the writeback. */
++ RTX_FRAME_RELATED_P (insn) = callee_adjust == 0;
+ }
++ else
++ aarch64_add_constant (Pmode, SP_REGNUM, IP1_REGNUM, final_adjust, true);
+
+- if (offset > 0)
+- {
+- unsigned reg1 = cfun->machine->frame.wb_candidate1;
+- unsigned reg2 = cfun->machine->frame.wb_candidate2;
+- bool skip_wb = true;
+- rtx cfi_ops = NULL;
-
-- push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
-- BASE_REG_CLASS, xmode, VOIDmode, 0, 0,
-- opnum, (enum reload_type) type);
-- return x;
-- }
+- if (frame_pointer_needed)
+- fp_offset = 0;
+- else if (fp_offset
+- || reg1 == FIRST_PSEUDO_REGISTER
+- || (reg2 == FIRST_PSEUDO_REGISTER
+- && offset >= 256))
+- skip_wb = false;
-
-- return NULL_RTX;
--}
+- aarch64_restore_callee_saves (DImode, fp_offset, R0_REGNUM, R30_REGNUM,
+- skip_wb, &cfi_ops);
+- aarch64_restore_callee_saves (DFmode, fp_offset, V0_REGNUM, V31_REGNUM,
+- skip_wb, &cfi_ops);
-
+- if (need_barrier_p)
+- emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
-
- /* Return the reload icode required for a constant pool in mode. */
- static enum insn_code
- aarch64_constant_pool_reload_icode (machine_mode mode)
-@@ -6411,10 +6399,6 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer ATTRIBUTE_UNUSED,
- /* TODO: A write to the CC flags possibly costs extra, this
- needs encoding in the cost tables. */
+- if (skip_wb)
+- {
+- machine_mode mode1 = (reg1 <= R30_REGNUM) ? DImode : DFmode;
+- rtx rreg1 = gen_rtx_REG (mode1, reg1);
++ aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
++ callee_adjust != 0, &cfi_ops);
++ aarch64_restore_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
++ callee_adjust != 0, &cfi_ops);
-- /* CC_ZESWPmode supports zero extend for free. */
-- if (mode == CC_ZESWPmode && GET_CODE (op0) == ZERO_EXTEND)
-- op0 = XEXP (op0, 0);
+- cfi_ops = alloc_reg_note (REG_CFA_RESTORE, rreg1, cfi_ops);
+- if (reg2 == FIRST_PSEUDO_REGISTER)
+- {
+- rtx mem = plus_constant (Pmode, stack_pointer_rtx, offset);
+- mem = gen_rtx_POST_MODIFY (Pmode, stack_pointer_rtx, mem);
+- mem = gen_rtx_MEM (mode1, mem);
+- insn = emit_move_insn (rreg1, mem);
+- }
+- else
+- {
+- rtx rreg2 = gen_rtx_REG (mode1, reg2);
++ if (need_barrier_p)
++ emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
+
+- cfi_ops = alloc_reg_note (REG_CFA_RESTORE, rreg2, cfi_ops);
+- insn = emit_insn (aarch64_gen_loadwb_pair
+- (mode1, stack_pointer_rtx, rreg1,
+- rreg2, offset));
+- }
+- }
+- else
+- {
+- insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
+- GEN_INT (offset)));
+- }
++ if (callee_adjust != 0)
++ aarch64_pop_regs (reg1, reg2, callee_adjust, &cfi_ops);
+
+- /* Reset the CFA to be SP + FRAME_SIZE. */
+- rtx new_cfa = stack_pointer_rtx;
+- if (frame_size > 0)
+- new_cfa = plus_constant (Pmode, new_cfa, frame_size);
+- cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, new_cfa, cfi_ops);
+- REG_NOTES (insn) = cfi_ops;
++ if (callee_adjust != 0 || initial_adjust > 65536)
++ {
++ /* Emit delayed restores and set the CFA to be SP + initial_adjust. */
++ insn = get_last_insn ();
++ rtx new_cfa = plus_constant (Pmode, stack_pointer_rtx, initial_adjust);
++ REG_NOTES (insn) = alloc_reg_note (REG_CFA_DEF_CFA, new_cfa, cfi_ops);
+ RTX_FRAME_RELATED_P (insn) = 1;
++ cfi_ops = NULL;
+ }
+
+- if (frame_size > 0)
+- {
+- if (need_barrier_p)
+- emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
-
- mode = GET_MODE (op0);
- /* ANDS. */
- if (GET_CODE (op0) == AND)
-@@ -7452,12 +7436,12 @@ aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
- to optimize 1.0/sqrt. */
+- if (frame_size >= 0x1000000)
+- {
+- rtx op0 = gen_rtx_REG (Pmode, IP0_REGNUM);
+- emit_move_insn (op0, GEN_INT (frame_size));
+- insn = emit_insn (gen_add2_insn (stack_pointer_rtx, op0));
+- }
+- else
+- {
+- int hi_ofs = frame_size & 0xfff000;
+- int lo_ofs = frame_size & 0x000fff;
++ aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, initial_adjust, true);
- static bool
--use_rsqrt_p (void)
-+use_rsqrt_p (machine_mode mode)
- {
- return (!flag_trapping_math
- && flag_unsafe_math_optimizations
-- && ((aarch64_tune_params.extra_tuning_flags
-- & AARCH64_EXTRA_TUNE_APPROX_RSQRT)
-+ && ((aarch64_tune_params.approx_modes->recip_sqrt
-+ & AARCH64_APPROX_MODE (mode))
- || flag_mrecip_low_precision_sqrt));
- }
+- if (hi_ofs && lo_ofs)
+- {
+- insn = emit_insn (gen_add2_insn
+- (stack_pointer_rtx, GEN_INT (hi_ofs)));
+- RTX_FRAME_RELATED_P (insn) = 1;
+- frame_size = lo_ofs;
+- }
+- insn = emit_insn (gen_add2_insn
+- (stack_pointer_rtx, GEN_INT (frame_size)));
+- }
+-
+- /* Reset the CFA to be SP + 0. */
+- add_reg_note (insn, REG_CFA_DEF_CFA, stack_pointer_rtx);
++ if (cfi_ops)
++ {
++ /* Emit delayed restores and reset the CFA to be SP. */
++ insn = get_last_insn ();
++ cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, stack_pointer_rtx, cfi_ops);
++ REG_NOTES (insn) = cfi_ops;
+ RTX_FRAME_RELATED_P (insn) = 1;
+ }
-@@ -7467,89 +7451,217 @@ use_rsqrt_p (void)
- static tree
- aarch64_builtin_reciprocal (tree fndecl)
- {
-- if (!use_rsqrt_p ())
-+ machine_mode mode = TYPE_MODE (TREE_TYPE (fndecl));
-+
-+ if (!use_rsqrt_p (mode))
- return NULL_TREE;
- return aarch64_builtin_rsqrt (DECL_FUNCTION_CODE (fndecl));
+@@ -3237,122 +3403,6 @@ aarch64_final_eh_return_addr (void)
+ - 2 * UNITS_PER_WORD));
}
- typedef rtx (*rsqrte_type) (rtx, rtx);
+-/* Possibly output code to build up a constant in a register. For
+- the benefit of the costs infrastructure, returns the number of
+- instructions which would be emitted. GENERATE inhibits or
+- enables code generation. */
+-
+-static int
+-aarch64_build_constant (int regnum, HOST_WIDE_INT val, bool generate)
+-{
+- int insns = 0;
+-
+- if (aarch64_bitmask_imm (val, DImode))
+- {
+- if (generate)
+- emit_move_insn (gen_rtx_REG (Pmode, regnum), GEN_INT (val));
+- insns = 1;
+- }
+- else
+- {
+- int i;
+- int ncount = 0;
+- int zcount = 0;
+- HOST_WIDE_INT valp = val >> 16;
+- HOST_WIDE_INT valm;
+- HOST_WIDE_INT tval;
+-
+- for (i = 16; i < 64; i += 16)
+- {
+- valm = (valp & 0xffff);
+-
+- if (valm != 0)
+- ++ zcount;
+-
+- if (valm != 0xffff)
+- ++ ncount;
+-
+- valp >>= 16;
+- }
+-
+- /* zcount contains the number of additional MOVK instructions
+- required if the constant is built up with an initial MOVZ instruction,
+- while ncount is the number of MOVK instructions required if starting
+- with a MOVN instruction. Choose the sequence that yields the fewest
+- number of instructions, preferring MOVZ instructions when they are both
+- the same. */
+- if (ncount < zcount)
+- {
+- if (generate)
+- emit_move_insn (gen_rtx_REG (Pmode, regnum),
+- GEN_INT (val | ~(HOST_WIDE_INT) 0xffff));
+- tval = 0xffff;
+- insns++;
+- }
+- else
+- {
+- if (generate)
+- emit_move_insn (gen_rtx_REG (Pmode, regnum),
+- GEN_INT (val & 0xffff));
+- tval = 0;
+- insns++;
+- }
+-
+- val >>= 16;
+-
+- for (i = 16; i < 64; i += 16)
+- {
+- if ((val & 0xffff) != tval)
+- {
+- if (generate)
+- emit_insn (gen_insv_immdi (gen_rtx_REG (Pmode, regnum),
+- GEN_INT (i),
+- GEN_INT (val & 0xffff)));
+- insns++;
+- }
+- val >>= 16;
+- }
+- }
+- return insns;
+-}
+-
+-static void
+-aarch64_add_constant (int regnum, int scratchreg, HOST_WIDE_INT delta)
+-{
+- HOST_WIDE_INT mdelta = delta;
+- rtx this_rtx = gen_rtx_REG (Pmode, regnum);
+- rtx scratch_rtx = gen_rtx_REG (Pmode, scratchreg);
+-
+- if (mdelta < 0)
+- mdelta = -mdelta;
+-
+- if (mdelta >= 4096 * 4096)
+- {
+- (void) aarch64_build_constant (scratchreg, delta, true);
+- emit_insn (gen_add3_insn (this_rtx, this_rtx, scratch_rtx));
+- }
+- else if (mdelta > 0)
+- {
+- if (mdelta >= 4096)
+- {
+- emit_insn (gen_rtx_SET (scratch_rtx, GEN_INT (mdelta / 4096)));
+- rtx shift = gen_rtx_ASHIFT (Pmode, scratch_rtx, GEN_INT (12));
+- if (delta < 0)
+- emit_insn (gen_rtx_SET (this_rtx,
+- gen_rtx_MINUS (Pmode, this_rtx, shift)));
+- else
+- emit_insn (gen_rtx_SET (this_rtx,
+- gen_rtx_PLUS (Pmode, this_rtx, shift)));
+- }
+- if (mdelta % 4096 != 0)
+- {
+- scratch_rtx = GEN_INT ((delta < 0 ? -1 : 1) * (mdelta % 4096));
+- emit_insn (gen_rtx_SET (this_rtx,
+- gen_rtx_PLUS (Pmode, this_rtx, scratch_rtx)));
+- }
+- }
+-}
+-
+ /* Output code to add DELTA to the first argument, and then jump
+ to FUNCTION. Used for C++ multiple inheritance. */
+ static void
+@@ -3373,7 +3423,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
+ emit_note (NOTE_INSN_PROLOGUE_END);
--/* Select reciprocal square root initial estimate
-- insn depending on machine mode. */
-+/* Select reciprocal square root initial estimate insn depending on machine
-+ mode. */
+ if (vcall_offset == 0)
+- aarch64_add_constant (this_regno, IP1_REGNUM, delta);
++ aarch64_add_constant (Pmode, this_regno, IP1_REGNUM, delta, false);
+ else
+ {
+ gcc_assert ((vcall_offset & (POINTER_BYTES - 1)) == 0);
+@@ -3389,7 +3439,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
+ addr = gen_rtx_PRE_MODIFY (Pmode, this_rtx,
+ plus_constant (Pmode, this_rtx, delta));
+ else
+- aarch64_add_constant (this_regno, IP1_REGNUM, delta);
++ aarch64_add_constant (Pmode, this_regno, IP1_REGNUM, delta, false);
+ }
--rsqrte_type
-+static rsqrte_type
- get_rsqrte_type (machine_mode mode)
- {
- switch (mode)
- {
-- case DFmode: return gen_aarch64_rsqrte_df2;
-- case SFmode: return gen_aarch64_rsqrte_sf2;
-- case V2DFmode: return gen_aarch64_rsqrte_v2df2;
-- case V2SFmode: return gen_aarch64_rsqrte_v2sf2;
-- case V4SFmode: return gen_aarch64_rsqrte_v4sf2;
-+ case DFmode: return gen_aarch64_rsqrtedf;
-+ case SFmode: return gen_aarch64_rsqrtesf;
-+ case V2DFmode: return gen_aarch64_rsqrtev2df;
-+ case V2SFmode: return gen_aarch64_rsqrtev2sf;
-+ case V4SFmode: return gen_aarch64_rsqrtev4sf;
- default: gcc_unreachable ();
- }
+ if (Pmode == ptr_mode)
+@@ -3403,7 +3453,8 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
+ addr = plus_constant (Pmode, temp0, vcall_offset);
+ else
+ {
+- (void) aarch64_build_constant (IP1_REGNUM, vcall_offset, true);
++ aarch64_internal_mov_immediate (temp1, GEN_INT (vcall_offset), true,
++ Pmode);
+ addr = gen_rtx_PLUS (Pmode, temp0, temp1);
+ }
+
+@@ -3582,7 +3633,12 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
+ return aarch64_tls_referenced_p (x);
}
- typedef rtx (*rsqrts_type) (rtx, rtx, rtx);
+-/* Implement TARGET_CASE_VALUES_THRESHOLD. */
++/* Implement TARGET_CASE_VALUES_THRESHOLD.
++ The expansion for a table switch is quite expensive due to the number
++ of instructions, the table lookup and hard to predict indirect jump.
++ When optimizing for speed, and -O3 enabled, use the per-core tuning if
++ set, otherwise use tables for > 16 cases as a tradeoff between size and
++ performance. When optimizing for size, use the default setting. */
--/* Select reciprocal square root Newton-Raphson step
-- insn depending on machine mode. */
-+/* Select reciprocal square root series step insn depending on machine mode. */
+ static unsigned int
+ aarch64_case_values_threshold (void)
+@@ -3593,7 +3649,7 @@ aarch64_case_values_threshold (void)
+ && selected_cpu->tune->max_case_values != 0)
+ return selected_cpu->tune->max_case_values;
+ else
+- return default_case_values_threshold ();
++ return optimize_size ? default_case_values_threshold () : 17;
+ }
--rsqrts_type
-+static rsqrts_type
- get_rsqrts_type (machine_mode mode)
- {
- switch (mode)
- {
-- case DFmode: return gen_aarch64_rsqrts_df3;
-- case SFmode: return gen_aarch64_rsqrts_sf3;
-- case V2DFmode: return gen_aarch64_rsqrts_v2df3;
-- case V2SFmode: return gen_aarch64_rsqrts_v2sf3;
-- case V4SFmode: return gen_aarch64_rsqrts_v4sf3;
-+ case DFmode: return gen_aarch64_rsqrtsdf;
-+ case SFmode: return gen_aarch64_rsqrtssf;
-+ case V2DFmode: return gen_aarch64_rsqrtsv2df;
-+ case V2SFmode: return gen_aarch64_rsqrtsv2sf;
-+ case V4SFmode: return gen_aarch64_rsqrtsv4sf;
- default: gcc_unreachable ();
- }
+ /* Return true if register REGNO is a valid index register.
+@@ -3928,9 +3984,11 @@ aarch64_classify_address (struct aarch64_address_info *info,
+ X,X: 7-bit signed scaled offset
+ Q: 9-bit signed offset
+ We conservatively require an offset representable in either mode.
+- */
++ When performing the check for pairs of X registers i.e. LDP/STP
++ pass down DImode since that is the natural size of the LDP/STP
++ instruction memory accesses. */
+ if (mode == TImode || mode == TFmode)
+- return (aarch64_offset_7bit_signed_scaled_p (mode, offset)
++ return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
+ && offset_9bit_signed_unscaled_p (mode, offset));
+
+ /* A 7bit offset check because OImode will emit a ldp/stp
+@@ -4038,7 +4096,7 @@ aarch64_classify_address (struct aarch64_address_info *info,
+ return ((GET_CODE (sym) == LABEL_REF
+ || (GET_CODE (sym) == SYMBOL_REF
+ && CONSTANT_POOL_ADDRESS_P (sym)
+- && !aarch64_nopcrelative_literal_loads)));
++ && aarch64_pcrelative_literal_loads)));
+ }
+ return false;
+
+@@ -4132,6 +4190,24 @@ aarch64_legitimate_address_p (machine_mode mode, rtx x,
+ return aarch64_classify_address (&addr, x, mode, outer_code, strict_p);
}
--/* Emit instruction sequence to compute the reciprocal square root using the
-- Newton-Raphson series. Iterate over the series twice for SF
-- and thrice for DF. */
-+/* Emit instruction sequence to compute either the approximate square root
-+ or its approximate reciprocal, depending on the flag RECP, and return
-+ whether the sequence was emitted or not. */
++/* Split an out-of-range address displacement into a base and offset.
++ Use 4KB range for 1- and 2-byte accesses and a 16KB range otherwise
++ to increase opportunities for sharing the base address of different sizes.
++ For TI/TFmode and unaligned accesses use a 256-byte range. */
++static bool
++aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode mode)
++{
++ HOST_WIDE_INT mask = GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3fff;
++
++ if (mode == TImode || mode == TFmode ||
++ (INTVAL (*disp) & (GET_MODE_SIZE (mode) - 1)) != 0)
++ mask = 0xff;
++
++ *off = GEN_INT (INTVAL (*disp) & ~mask);
++ *disp = GEN_INT (INTVAL (*disp) & mask);
++ return true;
++}
++
+ /* Return TRUE if rtx X is immediate constant 0.0 */
+ bool
+ aarch64_float_const_zero_rtx_p (rtx x)
+@@ -4205,6 +4281,14 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
+ && (GET_MODE (x) == HImode || GET_MODE (x) == QImode))
+ return CC_NZmode;
--void
--aarch64_emit_approx_rsqrt (rtx dst, rtx src)
-+bool
-+aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
- {
-- machine_mode mode = GET_MODE (src);
-- gcc_assert (
-- mode == SFmode || mode == V2SFmode || mode == V4SFmode
-- || mode == DFmode || mode == V2DFmode);
++ /* Similarly, comparisons of zero_extends from shorter modes can
++ be performed using an ANDS with an immediate mask. */
++ if (y == const0_rtx && GET_CODE (x) == ZERO_EXTEND
++ && (GET_MODE (x) == SImode || GET_MODE (x) == DImode)
++ && (GET_MODE (XEXP (x, 0)) == HImode || GET_MODE (XEXP (x, 0)) == QImode)
++ && (code == EQ || code == NE))
++ return CC_NZmode;
++
+ if ((GET_MODE (x) == SImode || GET_MODE (x) == DImode)
+ && y == const0_rtx
+ && (code == EQ || code == NE || code == LT || code == GE)
+@@ -4232,14 +4316,6 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
+ && GET_CODE (x) == NEG)
+ return CC_Zmode;
+
+- /* A compare of a mode narrower than SI mode against zero can be done
+- by extending the value in the comparison. */
+- if ((GET_MODE (x) == QImode || GET_MODE (x) == HImode)
+- && y == const0_rtx)
+- /* Only use sign-extension if we really need it. */
+- return ((code == GT || code == GE || code == LE || code == LT)
+- ? CC_SESWPmode : CC_ZESWPmode);
-
-- rtx xsrc = gen_reg_rtx (mode);
-- emit_move_insn (xsrc, src);
-- rtx x0 = gen_reg_rtx (mode);
-+ machine_mode mode = GET_MODE (dst);
-+ machine_mode mmsk = mode_for_vector
-+ (int_mode_for_mode (GET_MODE_INNER (mode)),
-+ GET_MODE_NUNITS (mode));
-+ bool use_approx_sqrt_p = (!recp
-+ && (flag_mlow_precision_sqrt
-+ || (aarch64_tune_params.approx_modes->sqrt
-+ & AARCH64_APPROX_MODE (mode))));
-+ bool use_approx_rsqrt_p = (recp
-+ && (flag_mrecip_low_precision_sqrt
-+ || (aarch64_tune_params.approx_modes->recip_sqrt
-+ & AARCH64_APPROX_MODE (mode))));
-+
-+ if (!flag_finite_math_only
-+ || flag_trapping_math
-+ || !flag_unsafe_math_optimizations
-+ || !(use_approx_sqrt_p || use_approx_rsqrt_p)
-+ || optimize_function_for_size_p (cfun))
-+ return false;
+ /* A test for unsigned overflow. */
+ if ((GET_MODE (x) == DImode || GET_MODE (x) == TImode)
+ && code == NE
+@@ -4308,8 +4384,6 @@ aarch64_get_condition_code_1 (enum machine_mode mode, enum rtx_code comp_code)
+ break;
-- emit_insn ((*get_rsqrte_type (mode)) (x0, xsrc));
-+ rtx xmsk = gen_reg_rtx (mmsk);
-+ if (!recp)
-+ /* When calculating the approximate square root, compare the argument with
-+ 0.0 and create a mask. */
-+ emit_insn (gen_rtx_SET (xmsk, gen_rtx_NEG (mmsk, gen_rtx_EQ (mmsk, src,
-+ CONST0_RTX (mode)))));
+ case CC_SWPmode:
+- case CC_ZESWPmode:
+- case CC_SESWPmode:
+ switch (comp_code)
+ {
+ case NE: return AARCH64_NE;
+@@ -4964,7 +5038,7 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode)
+ if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
+ {
+ rtx base = XEXP (x, 0);
+- rtx offset_rtx XEXP (x, 1);
++ rtx offset_rtx = XEXP (x, 1);
+ HOST_WIDE_INT offset = INTVAL (offset_rtx);
-- bool double_mode = (mode == DFmode || mode == V2DFmode);
-+ /* Estimate the approximate reciprocal square root. */
-+ rtx xdst = gen_reg_rtx (mode);
-+ emit_insn ((*get_rsqrte_type (mode)) (xdst, src));
+ if (GET_CODE (base) == PLUS)
+@@ -5022,120 +5096,6 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode)
+ return x;
+ }
-- int iterations = double_mode ? 3 : 2;
-+ /* Iterate over the series twice for SF and thrice for DF. */
-+ int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
+-/* Try a machine-dependent way of reloading an illegitimate address
+- operand. If we find one, push the reload and return the new rtx. */
+-
+-rtx
+-aarch64_legitimize_reload_address (rtx *x_p,
+- machine_mode mode,
+- int opnum, int type,
+- int ind_levels ATTRIBUTE_UNUSED)
+-{
+- rtx x = *x_p;
+-
+- /* Do not allow mem (plus (reg, const)) if vector struct mode. */
+- if (aarch64_vect_struct_mode_p (mode)
+- && GET_CODE (x) == PLUS
+- && REG_P (XEXP (x, 0))
+- && CONST_INT_P (XEXP (x, 1)))
+- {
+- rtx orig_rtx = x;
+- x = copy_rtx (x);
+- push_reload (orig_rtx, NULL_RTX, x_p, NULL,
+- BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
+- opnum, (enum reload_type) type);
+- return x;
+- }
+-
+- /* We must recognize output that we have already generated ourselves. */
+- if (GET_CODE (x) == PLUS
+- && GET_CODE (XEXP (x, 0)) == PLUS
+- && REG_P (XEXP (XEXP (x, 0), 0))
+- && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+- && CONST_INT_P (XEXP (x, 1)))
+- {
+- push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
+- BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
+- opnum, (enum reload_type) type);
+- return x;
+- }
+-
+- /* We wish to handle large displacements off a base register by splitting
+- the addend across an add and the mem insn. This can cut the number of
+- extra insns needed from 3 to 1. It is only useful for load/store of a
+- single register with 12 bit offset field. */
+- if (GET_CODE (x) == PLUS
+- && REG_P (XEXP (x, 0))
+- && CONST_INT_P (XEXP (x, 1))
+- && HARD_REGISTER_P (XEXP (x, 0))
+- && mode != TImode
+- && mode != TFmode
+- && aarch64_regno_ok_for_base_p (REGNO (XEXP (x, 0)), true))
+- {
+- HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
+- HOST_WIDE_INT low = val & 0xfff;
+- HOST_WIDE_INT high = val - low;
+- HOST_WIDE_INT offs;
+- rtx cst;
+- machine_mode xmode = GET_MODE (x);
+-
+- /* In ILP32, xmode can be either DImode or SImode. */
+- gcc_assert (xmode == DImode || xmode == SImode);
+-
+- /* Reload non-zero BLKmode offsets. This is because we cannot ascertain
+- BLKmode alignment. */
+- if (GET_MODE_SIZE (mode) == 0)
+- return NULL_RTX;
+-
+- offs = low % GET_MODE_SIZE (mode);
+-
+- /* Align misaligned offset by adjusting high part to compensate. */
+- if (offs != 0)
+- {
+- if (aarch64_uimm12_shift (high + offs))
+- {
+- /* Align down. */
+- low = low - offs;
+- high = high + offs;
+- }
+- else
+- {
+- /* Align up. */
+- offs = GET_MODE_SIZE (mode) - offs;
+- low = low + offs;
+- high = high + (low & 0x1000) - offs;
+- low &= 0xfff;
+- }
+- }
+-
+- /* Check for overflow. */
+- if (high + low != val)
+- return NULL_RTX;
+-
+- cst = GEN_INT (high);
+- if (!aarch64_uimm12_shift (high))
+- cst = force_const_mem (xmode, cst);
+-
+- /* Reload high part into base reg, leaving the low part
+- in the mem instruction.
+- Note that replacing this gen_rtx_PLUS with plus_constant is
+- wrong in this case because we rely on the
+- (plus (plus reg c1) c2) structure being preserved so that
+- XEXP (*p, 0) in push_reload below uses the correct term. */
+- x = gen_rtx_PLUS (xmode,
+- gen_rtx_PLUS (xmode, XEXP (x, 0), cst),
+- GEN_INT (low));
+-
+- push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
+- BASE_REG_CLASS, xmode, VOIDmode, 0, 0,
+- opnum, (enum reload_type) type);
+- return x;
+- }
+-
+- return NULL_RTX;
+-}
+-
+-
+ /* Return the reload icode required for a constant pool in mode. */
+ static enum insn_code
+ aarch64_constant_pool_reload_icode (machine_mode mode)
+@@ -5193,7 +5153,7 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x,
+ if (MEM_P (x) && GET_CODE (x) == SYMBOL_REF && CONSTANT_POOL_ADDRESS_P (x)
+ && (SCALAR_FLOAT_MODE_P (GET_MODE (x))
+ || targetm.vector_mode_supported_p (GET_MODE (x)))
+- && aarch64_nopcrelative_literal_loads)
++ && !aarch64_pcrelative_literal_loads)
+ {
+ sri->icode = aarch64_constant_pool_reload_icode (mode);
+ return NO_REGS;
+@@ -5267,18 +5227,18 @@ aarch64_initial_elimination_offset (unsigned from, unsigned to)
+ if (to == HARD_FRAME_POINTER_REGNUM)
+ {
+ if (from == ARG_POINTER_REGNUM)
+- return cfun->machine->frame.frame_size - crtl->outgoing_args_size;
++ return cfun->machine->frame.hard_fp_offset;
-- /* Optionally iterate over the series one less time than otherwise. */
-- if (flag_mrecip_low_precision_sqrt)
-+ /* Optionally iterate over the series once less for faster performance
-+ while sacrificing the accuracy. */
-+ if ((recp && flag_mrecip_low_precision_sqrt)
-+ || (!recp && flag_mlow_precision_sqrt))
- iterations--;
+ if (from == FRAME_POINTER_REGNUM)
+- return (cfun->machine->frame.hard_fp_offset
+- - cfun->machine->frame.saved_varargs_size);
++ return cfun->machine->frame.hard_fp_offset
++ - cfun->machine->frame.locals_offset;
+ }
-- for (int i = 0; i < iterations; ++i)
-+ /* Iterate over the series to calculate the approximate reciprocal square
-+ root. */
-+ rtx x1 = gen_reg_rtx (mode);
-+ while (iterations--)
+ if (to == STACK_POINTER_REGNUM)
{
-- rtx x1 = gen_reg_rtx (mode);
- rtx x2 = gen_reg_rtx (mode);
-- rtx x3 = gen_reg_rtx (mode);
-- emit_set_insn (x2, gen_rtx_MULT (mode, x0, x0));
-+ emit_set_insn (x2, gen_rtx_MULT (mode, xdst, xdst));
-+
-+ emit_insn ((*get_rsqrts_type (mode)) (x1, src, x2));
+ if (from == FRAME_POINTER_REGNUM)
+- return (cfun->machine->frame.frame_size
+- - cfun->machine->frame.saved_varargs_size);
++ return cfun->machine->frame.frame_size
++ - cfun->machine->frame.locals_offset;
+ }
-- emit_insn ((*get_rsqrts_type (mode)) (x3, xsrc, x2));
-+ if (iterations > 0)
-+ emit_set_insn (xdst, gen_rtx_MULT (mode, xdst, x1));
-+ }
-+
-+ if (!recp)
-+ {
-+ /* Qualify the approximate reciprocal square root when the argument is
-+ 0.0 by squashing the intermediary result to 0.0. */
-+ rtx xtmp = gen_reg_rtx (mmsk);
-+ emit_set_insn (xtmp, gen_rtx_AND (mmsk, gen_rtx_NOT (mmsk, xmsk),
-+ gen_rtx_SUBREG (mmsk, xdst, 0)));
-+ emit_move_insn (xdst, gen_rtx_SUBREG (mode, xtmp, 0));
+ return cfun->machine->frame.frame_size;
+@@ -5527,7 +5487,7 @@ aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
+ static inline bool
+ aarch64_can_use_per_function_literal_pools_p (void)
+ {
+- return (!aarch64_nopcrelative_literal_loads
++ return (aarch64_pcrelative_literal_loads
+ || aarch64_cmodel == AARCH64_CMODEL_LARGE);
+ }
-- emit_set_insn (x1, gen_rtx_MULT (mode, x0, x3));
-- x0 = x1;
-+ /* Calculate the approximate square root. */
-+ emit_set_insn (xdst, gen_rtx_MULT (mode, xdst, src));
- }
+@@ -6146,6 +6106,19 @@ aarch64_extend_bitfield_pattern_p (rtx x)
+ return op;
+ }
-- emit_move_insn (dst, x0);
-+ /* Finalize the approximation. */
-+ emit_set_insn (dst, gen_rtx_MULT (mode, xdst, x1));
-+
-+ return true;
-+}
-+
-+typedef rtx (*recpe_type) (rtx, rtx);
-+
-+/* Select reciprocal initial estimate insn depending on machine mode. */
-+
-+static recpe_type
-+get_recpe_type (machine_mode mode)
-+{
-+ switch (mode)
-+ {
-+ case SFmode: return (gen_aarch64_frecpesf);
-+ case V2SFmode: return (gen_aarch64_frecpev2sf);
-+ case V4SFmode: return (gen_aarch64_frecpev4sf);
-+ case DFmode: return (gen_aarch64_frecpedf);
-+ case V2DFmode: return (gen_aarch64_frecpev2df);
-+ default: gcc_unreachable ();
-+ }
-+}
-+
-+typedef rtx (*recps_type) (rtx, rtx, rtx);
-+
-+/* Select reciprocal series step insn depending on machine mode. */
++/* Return true if the mask and a shift amount from an RTX of the form
++ (x << SHFT_AMNT) & MASK are valid to combine into a UBFIZ instruction of
++ mode MODE. See the *andim_ashift<mode>_bfiz pattern. */
+
-+static recps_type
-+get_recps_type (machine_mode mode)
++bool
++aarch64_mask_and_shift_for_ubfiz_p (machine_mode mode, rtx mask, rtx shft_amnt)
+{
-+ switch (mode)
-+ {
-+ case SFmode: return (gen_aarch64_frecpssf);
-+ case V2SFmode: return (gen_aarch64_frecpsv2sf);
-+ case V4SFmode: return (gen_aarch64_frecpsv4sf);
-+ case DFmode: return (gen_aarch64_frecpsdf);
-+ case V2DFmode: return (gen_aarch64_frecpsv2df);
-+ default: gcc_unreachable ();
-+ }
++ return CONST_INT_P (mask) && CONST_INT_P (shft_amnt)
++ && INTVAL (shft_amnt) < GET_MODE_BITSIZE (mode)
++ && exact_log2 ((INTVAL (mask) >> INTVAL (shft_amnt)) + 1) >= 0
++ && (INTVAL (mask) & ((1 << INTVAL (shft_amnt)) - 1)) == 0;
+}
+
-+/* Emit the instruction sequence to compute the approximation for the division
-+ of NUM by DEN in QUO and return whether the sequence was emitted or not. */
-+
-+bool
-+aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
-+{
-+ machine_mode mode = GET_MODE (quo);
-+ bool use_approx_division_p = (flag_mlow_precision_div
-+ || (aarch64_tune_params.approx_modes->division
-+ & AARCH64_APPROX_MODE (mode)));
+ /* Calculate the cost of calculating X, storing it in *COST. Result
+ is true if the total cost of the operation has now been calculated. */
+ static bool
+@@ -6411,10 +6384,6 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer ATTRIBUTE_UNUSED,
+ /* TODO: A write to the CC flags possibly costs extra, this
+ needs encoding in the cost tables. */
+
+- /* CC_ZESWPmode supports zero extend for free. */
+- if (mode == CC_ZESWPmode && GET_CODE (op0) == ZERO_EXTEND)
+- op0 = XEXP (op0, 0);
+-
+ mode = GET_MODE (op0);
+ /* ANDS. */
+ if (GET_CODE (op0) == AND)
+@@ -6724,17 +6693,31 @@ cost_plus:
+
+ if (GET_MODE_CLASS (mode) == MODE_INT)
+ {
+- /* We possibly get the immediate for free, this is not
+- modelled. */
+- if (CONST_INT_P (op1)
+- && aarch64_bitmask_imm (INTVAL (op1), mode))
++ if (CONST_INT_P (op1))
+ {
+- *cost += rtx_cost (op0, mode, (enum rtx_code) code, 0, speed);
++ /* We have a mask + shift version of a UBFIZ
++ i.e. the *andim_ashift<mode>_bfiz pattern. */
++ if (GET_CODE (op0) == ASHIFT
++ && aarch64_mask_and_shift_for_ubfiz_p (mode, op1,
++ XEXP (op0, 1)))
++ {
++ *cost += rtx_cost (XEXP (op0, 0), mode,
++ (enum rtx_code) code, 0, speed);
++ if (speed)
++ *cost += extra_cost->alu.bfx;
+
+- if (speed)
+- *cost += extra_cost->alu.logical;
++ return true;
++ }
++ else if (aarch64_bitmask_imm (INTVAL (op1), mode))
++ {
++ /* We possibly get the immediate for free, this is not
++ modelled. */
++ *cost += rtx_cost (op0, mode, (enum rtx_code) code, 0, speed);
++ if (speed)
++ *cost += extra_cost->alu.logical;
+
+- return true;
++ return true;
++ }
+ }
+ else
+ {
+@@ -6838,11 +6821,12 @@ cost_plus:
+ {
+ int op_cost = rtx_cost (op0, VOIDmode, ZERO_EXTEND, 0, speed);
+
+- if (!op_cost && speed)
+- /* MOV. */
+- *cost += extra_cost->alu.extend;
+- else
+- /* Free, the cost is that of the SI mode operation. */
++ /* If OP_COST is non-zero, then the cost of the zero extend
++ is effectively the cost of the inner operation. Otherwise
++ we have a MOV instruction and we take the cost from the MOV
++ itself. This is true independently of whether we are
++ optimizing for space or time. */
++ if (op_cost)
+ *cost = op_cost;
+
+ return true;
+@@ -6872,8 +6856,8 @@ cost_plus:
+ }
+ else
+ {
+- /* UXTB/UXTH. */
+- *cost += extra_cost->alu.extend;
++ /* We generate an AND instead of UXTB/UXTH. */
++ *cost += extra_cost->alu.logical;
+ }
+ }
+ return false;
+@@ -7452,12 +7436,12 @@ aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
+ to optimize 1.0/sqrt. */
+
+ static bool
+-use_rsqrt_p (void)
++use_rsqrt_p (machine_mode mode)
+ {
+ return (!flag_trapping_math
+ && flag_unsafe_math_optimizations
+- && ((aarch64_tune_params.extra_tuning_flags
+- & AARCH64_EXTRA_TUNE_APPROX_RSQRT)
++ && ((aarch64_tune_params.approx_modes->recip_sqrt
++ & AARCH64_APPROX_MODE (mode))
+ || flag_mrecip_low_precision_sqrt));
+ }
+
+@@ -7467,89 +7451,225 @@ use_rsqrt_p (void)
+ static tree
+ aarch64_builtin_reciprocal (tree fndecl)
+ {
+- if (!use_rsqrt_p ())
++ machine_mode mode = TYPE_MODE (TREE_TYPE (fndecl));
++
++ if (!use_rsqrt_p (mode))
+ return NULL_TREE;
+ return aarch64_builtin_rsqrt (DECL_FUNCTION_CODE (fndecl));
+ }
+
+ typedef rtx (*rsqrte_type) (rtx, rtx);
+
+-/* Select reciprocal square root initial estimate
+- insn depending on machine mode. */
++/* Select reciprocal square root initial estimate insn depending on machine
++ mode. */
+
+-rsqrte_type
++static rsqrte_type
+ get_rsqrte_type (machine_mode mode)
+ {
+ switch (mode)
+ {
+- case DFmode: return gen_aarch64_rsqrte_df2;
+- case SFmode: return gen_aarch64_rsqrte_sf2;
+- case V2DFmode: return gen_aarch64_rsqrte_v2df2;
+- case V2SFmode: return gen_aarch64_rsqrte_v2sf2;
+- case V4SFmode: return gen_aarch64_rsqrte_v4sf2;
++ case DFmode: return gen_aarch64_rsqrtedf;
++ case SFmode: return gen_aarch64_rsqrtesf;
++ case V2DFmode: return gen_aarch64_rsqrtev2df;
++ case V2SFmode: return gen_aarch64_rsqrtev2sf;
++ case V4SFmode: return gen_aarch64_rsqrtev4sf;
+ default: gcc_unreachable ();
+ }
+ }
+
+ typedef rtx (*rsqrts_type) (rtx, rtx, rtx);
+
+-/* Select reciprocal square root Newton-Raphson step
+- insn depending on machine mode. */
++/* Select reciprocal square root series step insn depending on machine mode. */
+
+-rsqrts_type
++static rsqrts_type
+ get_rsqrts_type (machine_mode mode)
+ {
+ switch (mode)
+ {
+- case DFmode: return gen_aarch64_rsqrts_df3;
+- case SFmode: return gen_aarch64_rsqrts_sf3;
+- case V2DFmode: return gen_aarch64_rsqrts_v2df3;
+- case V2SFmode: return gen_aarch64_rsqrts_v2sf3;
+- case V4SFmode: return gen_aarch64_rsqrts_v4sf3;
++ case DFmode: return gen_aarch64_rsqrtsdf;
++ case SFmode: return gen_aarch64_rsqrtssf;
++ case V2DFmode: return gen_aarch64_rsqrtsv2df;
++ case V2SFmode: return gen_aarch64_rsqrtsv2sf;
++ case V4SFmode: return gen_aarch64_rsqrtsv4sf;
+ default: gcc_unreachable ();
+ }
+ }
+
+-/* Emit instruction sequence to compute the reciprocal square root using the
+- Newton-Raphson series. Iterate over the series twice for SF
+- and thrice for DF. */
++/* Emit instruction sequence to compute either the approximate square root
++ or its approximate reciprocal, depending on the flag RECP, and return
++ whether the sequence was emitted or not. */
+
+-void
+-aarch64_emit_approx_rsqrt (rtx dst, rtx src)
++bool
++aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
+ {
+- machine_mode mode = GET_MODE (src);
+- gcc_assert (
+- mode == SFmode || mode == V2SFmode || mode == V4SFmode
+- || mode == DFmode || mode == V2DFmode);
++ machine_mode mode = GET_MODE (dst);
+
+- rtx xsrc = gen_reg_rtx (mode);
+- emit_move_insn (xsrc, src);
+- rtx x0 = gen_reg_rtx (mode);
++ if (GET_MODE_INNER (mode) == HFmode)
++ return false;
+
+- emit_insn ((*get_rsqrte_type (mode)) (x0, xsrc));
++ machine_mode mmsk = mode_for_vector
++ (int_mode_for_mode (GET_MODE_INNER (mode)),
++ GET_MODE_NUNITS (mode));
++ bool use_approx_sqrt_p = (!recp
++ && (flag_mlow_precision_sqrt
++ || (aarch64_tune_params.approx_modes->sqrt
++ & AARCH64_APPROX_MODE (mode))));
++ bool use_approx_rsqrt_p = (recp
++ && (flag_mrecip_low_precision_sqrt
++ || (aarch64_tune_params.approx_modes->recip_sqrt
++ & AARCH64_APPROX_MODE (mode))));
+
+ if (!flag_finite_math_only
+ || flag_trapping_math
+ || !flag_unsafe_math_optimizations
-+ || optimize_function_for_size_p (cfun)
-+ || !use_approx_division_p)
++ || !(use_approx_sqrt_p || use_approx_rsqrt_p)
++ || optimize_function_for_size_p (cfun))
+ return false;
-+
-+ /* Estimate the approximate reciprocal. */
-+ rtx xrcp = gen_reg_rtx (mode);
-+ emit_insn ((*get_recpe_type (mode)) (xrcp, den));
-+
+
+- bool double_mode = (mode == DFmode || mode == V2DFmode);
++ rtx xmsk = gen_reg_rtx (mmsk);
++ if (!recp)
++ /* When calculating the approximate square root, compare the argument with
++ 0.0 and create a mask. */
++ emit_insn (gen_rtx_SET (xmsk, gen_rtx_NEG (mmsk, gen_rtx_EQ (mmsk, src,
++ CONST0_RTX (mode)))));
+
+- int iterations = double_mode ? 3 : 2;
++ /* Estimate the approximate reciprocal square root. */
++ rtx xdst = gen_reg_rtx (mode);
++ emit_insn ((*get_rsqrte_type (mode)) (xdst, src));
+
+- /* Optionally iterate over the series one less time than otherwise. */
+- if (flag_mrecip_low_precision_sqrt)
+ /* Iterate over the series twice for SF and thrice for DF. */
+ int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
+
-+ /* Optionally iterate over the series once less for faster performance,
++ /* Optionally iterate over the series once less for faster performance
+ while sacrificing the accuracy. */
-+ if (flag_mlow_precision_div)
-+ iterations--;
-+
-+ /* Iterate over the series to calculate the approximate reciprocal. */
-+ rtx xtmp = gen_reg_rtx (mode);
++ if ((recp && flag_mrecip_low_precision_sqrt)
++ || (!recp && flag_mlow_precision_sqrt))
+ iterations--;
+
+- for (int i = 0; i < iterations; ++i)
++ /* Iterate over the series to calculate the approximate reciprocal square
++ root. */
++ rtx x1 = gen_reg_rtx (mode);
+ while (iterations--)
-+ {
-+ emit_insn ((*get_recps_type (mode)) (xtmp, xrcp, den));
+ {
+- rtx x1 = gen_reg_rtx (mode);
+ rtx x2 = gen_reg_rtx (mode);
+- rtx x3 = gen_reg_rtx (mode);
+- emit_set_insn (x2, gen_rtx_MULT (mode, x0, x0));
++ emit_set_insn (x2, gen_rtx_MULT (mode, xdst, xdst));
++
++ emit_insn ((*get_rsqrts_type (mode)) (x1, src, x2));
+
+ if (iterations > 0)
-+ emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
++ emit_set_insn (xdst, gen_rtx_MULT (mode, xdst, x1));
+ }
+
-+ if (num != CONST1_RTX (mode))
++ if (!recp)
+ {
-+ /* As the approximate reciprocal of DEN is already calculated, only
-+ calculate the approximate division when NUM is not 1.0. */
-+ rtx xnum = force_reg (mode, num);
-+ emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
++ /* Qualify the approximate reciprocal square root when the argument is
++ 0.0 by squashing the intermediary result to 0.0. */
++ rtx xtmp = gen_reg_rtx (mmsk);
++ emit_set_insn (xtmp, gen_rtx_AND (mmsk, gen_rtx_NOT (mmsk, xmsk),
++ gen_rtx_SUBREG (mmsk, xdst, 0)));
++ emit_move_insn (xdst, gen_rtx_SUBREG (mode, xtmp, 0));
++
++ /* Calculate the approximate square root. */
++ emit_set_insn (xdst, gen_rtx_MULT (mode, xdst, src));
+ }
+
+ /* Finalize the approximation. */
-+ emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp));
++ emit_set_insn (dst, gen_rtx_MULT (mode, xdst, x1));
++
+ return true;
- }
-
- /* Return the number of instructions that can be issued per cycle. */
-@@ -8079,6 +8191,12 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts)
- && (aarch64_cmodel == AARCH64_CMODEL_TINY
- || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC))
- aarch64_nopcrelative_literal_loads = false;
++}
+
-+ /* When enabling the lower precision Newton series for the square root, also
-+ enable it for the reciprocal square root, since the latter is an
-+ intermediary step for the former. */
-+ if (flag_mlow_precision_sqrt)
-+ flag_mrecip_low_precision_sqrt = true;
- }
-
- /* 'Unpack' up the internal tuning structs and update the options
-@@ -9463,6 +9581,13 @@ aarch64_build_builtin_va_list (void)
- FIELD_DECL, get_identifier ("__vr_offs"),
- integer_type_node);
-
-+ /* Tell tree-stdarg pass about our internal offset fields.
-+ NOTE: va_list_gpr/fpr_counter_field are only used for tree comparision
-+ purpose to identify whether the code is updating va_list internal
-+ offset fields through irregular way. */
-+ va_list_gpr_counter_field = f_groff;
-+ va_list_fpr_counter_field = f_vroff;
++typedef rtx (*recpe_type) (rtx, rtx);
+
- DECL_ARTIFICIAL (f_stack) = 1;
- DECL_ARTIFICIAL (f_grtop) = 1;
- DECL_ARTIFICIAL (f_vrtop) = 1;
-@@ -9495,15 +9620,17 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
- tree f_stack, f_grtop, f_vrtop, f_groff, f_vroff;
- tree stack, grtop, vrtop, groff, vroff;
- tree t;
-- int gr_save_area_size;
-- int vr_save_area_size;
-+ int gr_save_area_size = cfun->va_list_gpr_size;
-+ int vr_save_area_size = cfun->va_list_fpr_size;
- int vr_offset;
-
- cum = &crtl->args.info;
-- gr_save_area_size
-- = (NUM_ARG_REGS - cum->aapcs_ncrn) * UNITS_PER_WORD;
-- vr_save_area_size
-- = (NUM_FP_ARG_REGS - cum->aapcs_nvrn) * UNITS_PER_VREG;
-+ if (cfun->va_list_gpr_size)
++/* Select reciprocal initial estimate insn depending on machine mode. */
++
++static recpe_type
++get_recpe_type (machine_mode mode)
++{
++ switch (mode)
++ {
++ case SFmode: return (gen_aarch64_frecpesf);
++ case V2SFmode: return (gen_aarch64_frecpev2sf);
++ case V4SFmode: return (gen_aarch64_frecpev4sf);
++ case DFmode: return (gen_aarch64_frecpedf);
++ case V2DFmode: return (gen_aarch64_frecpev2df);
++ default: gcc_unreachable ();
++ }
++}
++
++typedef rtx (*recps_type) (rtx, rtx, rtx);
++
++/* Select reciprocal series step insn depending on machine mode. */
++
++static recps_type
++get_recps_type (machine_mode mode)
++{
++ switch (mode)
++ {
++ case SFmode: return (gen_aarch64_frecpssf);
++ case V2SFmode: return (gen_aarch64_frecpsv2sf);
++ case V4SFmode: return (gen_aarch64_frecpsv4sf);
++ case DFmode: return (gen_aarch64_frecpsdf);
++ case V2DFmode: return (gen_aarch64_frecpsv2df);
++ default: gcc_unreachable ();
++ }
++}
++
++/* Emit the instruction sequence to compute the approximation for the division
++ of NUM by DEN in QUO and return whether the sequence was emitted or not. */
++
++bool
++aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
++{
++ machine_mode mode = GET_MODE (quo);
++
++ if (GET_MODE_INNER (mode) == HFmode)
++ return false;
++
++ bool use_approx_division_p = (flag_mlow_precision_div
++ || (aarch64_tune_params.approx_modes->division
++ & AARCH64_APPROX_MODE (mode)));
++
++ if (!flag_finite_math_only
++ || flag_trapping_math
++ || !flag_unsafe_math_optimizations
++ || optimize_function_for_size_p (cfun)
++ || !use_approx_division_p)
++ return false;
++
++ /* Estimate the approximate reciprocal. */
++ rtx xrcp = gen_reg_rtx (mode);
++ emit_insn ((*get_recpe_type (mode)) (xrcp, den));
+
+- emit_insn ((*get_rsqrts_type (mode)) (x3, xsrc, x2));
++ /* Iterate over the series twice for SF and thrice for DF. */
++ int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
+
+- emit_set_insn (x1, gen_rtx_MULT (mode, x0, x3));
+- x0 = x1;
++ /* Optionally iterate over the series once less for faster performance,
++ while sacrificing the accuracy. */
++ if (flag_mlow_precision_div)
++ iterations--;
++
++ /* Iterate over the series to calculate the approximate reciprocal. */
++ rtx xtmp = gen_reg_rtx (mode);
++ while (iterations--)
++ {
++ emit_insn ((*get_recps_type (mode)) (xtmp, xrcp, den));
++
++ if (iterations > 0)
++ emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
+ }
+
+- emit_move_insn (dst, x0);
++ if (num != CONST1_RTX (mode))
++ {
++ /* As the approximate reciprocal of DEN is already calculated, only
++ calculate the approximate division when NUM is not 1.0. */
++ rtx xnum = force_reg (mode, num);
++ emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
++ }
++
++ /* Finalize the approximation. */
++ emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp));
++ return true;
+ }
+
+ /* Return the number of instructions that can be issued per cycle. */
+@@ -8053,32 +8173,37 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts)
+ opts->x_align_functions = aarch64_tune_params.function_align;
+ }
+
+- /* If nopcrelative_literal_loads is set on the command line, this
++ /* We default to no pc-relative literal loads. */
++
++ aarch64_pcrelative_literal_loads = false;
++
++ /* If -mpc-relative-literal-loads is set on the command line, this
+ implies that the user asked for PC relative literal loads. */
+- if (opts->x_nopcrelative_literal_loads == 1)
+- aarch64_nopcrelative_literal_loads = false;
++ if (opts->x_pcrelative_literal_loads == 1)
++ aarch64_pcrelative_literal_loads = true;
+
+- /* If it is not set on the command line, we default to no pc
+- relative literal loads, unless the workaround for Cortex-A53
+- erratum 843419 is in effect. */
+ /* This is PR70113. When building the Linux kernel with
+ CONFIG_ARM64_ERRATUM_843419, support for relocations
+ R_AARCH64_ADR_PREL_PG_HI21 and R_AARCH64_ADR_PREL_PG_HI21_NC is
+ removed from the kernel to avoid loading objects with possibly
+- offending sequences. With nopcrelative_literal_loads, we would
++ offending sequences. Without -mpc-relative-literal-loads we would
+ generate such relocations, preventing the kernel build from
+ succeeding. */
+- if (opts->x_nopcrelative_literal_loads == 2
+- && !TARGET_FIX_ERR_A53_843419)
+- aarch64_nopcrelative_literal_loads = true;
++ if (opts->x_pcrelative_literal_loads == 2
++ && TARGET_FIX_ERR_A53_843419)
++ aarch64_pcrelative_literal_loads = true;
+
+- /* In the tiny memory model it makes no sense
+- to disallow non PC relative literal pool loads
+- as many other things will break anyway. */
+- if (opts->x_nopcrelative_literal_loads
+- && (aarch64_cmodel == AARCH64_CMODEL_TINY
+- || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC))
+- aarch64_nopcrelative_literal_loads = false;
++ /* In the tiny memory model it makes no sense to disallow PC relative
++ literal pool loads. */
++ if (aarch64_cmodel == AARCH64_CMODEL_TINY
++ || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
++ aarch64_pcrelative_literal_loads = true;
++
++ /* When enabling the lower precision Newton series for the square root, also
++ enable it for the reciprocal square root, since the latter is an
++ intermediary step for the former. */
++ if (flag_mlow_precision_sqrt)
++ flag_mrecip_low_precision_sqrt = true;
+ }
+
+ /* 'Unpack' up the internal tuning structs and update the options
+@@ -9280,33 +9405,24 @@ aarch64_classify_symbol (rtx x, rtx offset)
+
+ if (GET_CODE (x) == SYMBOL_REF)
+ {
+- if (aarch64_cmodel == AARCH64_CMODEL_LARGE)
+- {
+- /* This is alright even in PIC code as the constant
+- pool reference is always PC relative and within
+- the same translation unit. */
+- if (nopcrelative_literal_loads
+- && CONSTANT_POOL_ADDRESS_P (x))
+- return SYMBOL_SMALL_ABSOLUTE;
+- else
+- return SYMBOL_FORCE_TO_MEM;
+- }
+-
+ if (aarch64_tls_symbol_p (x))
+ return aarch64_classify_tls_symbol (x);
+
+ switch (aarch64_cmodel)
+ {
+ case AARCH64_CMODEL_TINY:
+- /* When we retreive symbol + offset address, we have to make sure
++ /* When we retrieve symbol + offset address, we have to make sure
+ the offset does not cause overflow of the final address. But
+ we have no way of knowing the address of symbol at compile time
+ so we can't accurately say if the distance between the PC and
+ symbol + offset is outside the addressible range of +/-1M in the
+ TINY code model. So we rely on images not being greater than
+ 1M and cap the offset at 1M and anything beyond 1M will have to
+- be loaded using an alternative mechanism. */
+- if (SYMBOL_REF_WEAK (x)
++ be loaded using an alternative mechanism. Furthermore if the
++ symbol is a weak reference to something that isn't known to
++ resolve to a symbol in this module, then force to memory. */
++ if ((SYMBOL_REF_WEAK (x)
++ && !aarch64_symbol_binds_local_p (x))
+ || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+ return SYMBOL_FORCE_TO_MEM;
+ return SYMBOL_TINY_ABSOLUTE;
+@@ -9314,7 +9430,8 @@ aarch64_classify_symbol (rtx x, rtx offset)
+ case AARCH64_CMODEL_SMALL:
+ /* Same reasoning as the tiny code model, but the offset cap here is
+ 4G. */
+- if (SYMBOL_REF_WEAK (x)
++ if ((SYMBOL_REF_WEAK (x)
++ && !aarch64_symbol_binds_local_p (x))
+ || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
+ HOST_WIDE_INT_C (4294967264)))
+ return SYMBOL_FORCE_TO_MEM;
+@@ -9332,6 +9449,15 @@ aarch64_classify_symbol (rtx x, rtx offset)
+ ? SYMBOL_SMALL_GOT_28K : SYMBOL_SMALL_GOT_4G);
+ return SYMBOL_SMALL_ABSOLUTE;
+
++ case AARCH64_CMODEL_LARGE:
++ /* This is alright even in PIC code as the constant
++ pool reference is always PC relative and within
++ the same translation unit. */
++ if (CONSTANT_POOL_ADDRESS_P (x))
++ return SYMBOL_SMALL_ABSOLUTE;
++ else
++ return SYMBOL_FORCE_TO_MEM;
++
+ default:
+ gcc_unreachable ();
+ }
+@@ -9463,6 +9589,13 @@ aarch64_build_builtin_va_list (void)
+ FIELD_DECL, get_identifier ("__vr_offs"),
+ integer_type_node);
+
++ /* Tell tree-stdarg pass about our internal offset fields.
++ NOTE: va_list_gpr/fpr_counter_field are only used for tree comparision
++ purpose to identify whether the code is updating va_list internal
++ offset fields through irregular way. */
++ va_list_gpr_counter_field = f_groff;
++ va_list_fpr_counter_field = f_vroff;
++
+ DECL_ARTIFICIAL (f_stack) = 1;
+ DECL_ARTIFICIAL (f_grtop) = 1;
+ DECL_ARTIFICIAL (f_vrtop) = 1;
+@@ -9495,15 +9628,17 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
+ tree f_stack, f_grtop, f_vrtop, f_groff, f_vroff;
+ tree stack, grtop, vrtop, groff, vroff;
+ tree t;
+- int gr_save_area_size;
+- int vr_save_area_size;
++ int gr_save_area_size = cfun->va_list_gpr_size;
++ int vr_save_area_size = cfun->va_list_fpr_size;
+ int vr_offset;
+
+ cum = &crtl->args.info;
+- gr_save_area_size
+- = (NUM_ARG_REGS - cum->aapcs_ncrn) * UNITS_PER_WORD;
+- vr_save_area_size
+- = (NUM_FP_ARG_REGS - cum->aapcs_nvrn) * UNITS_PER_VREG;
++ if (cfun->va_list_gpr_size)
+ gr_save_area_size = MIN ((NUM_ARG_REGS - cum->aapcs_ncrn) * UNITS_PER_WORD,
+ cfun->va_list_gpr_size);
+ if (cfun->va_list_fpr_size)
@@ -1458,7 +4073,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
if (!TARGET_FLOAT)
{
-@@ -9832,7 +9959,8 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
+@@ -9832,7 +9967,8 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
{
CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
CUMULATIVE_ARGS local_cum;
@@ -1468,7 +4083,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
/* The caller has advanced CUM up to, but not beyond, the last named
argument. Advance a local copy of CUM past the last "real" named
-@@ -9840,9 +9968,14 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
+@@ -9840,9 +9976,14 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
local_cum = *cum;
aarch64_function_arg_advance (pack_cumulative_args(&local_cum), mode, type, true);
@@ -1486,7 +4101,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
if (!TARGET_FLOAT)
{
-@@ -9870,7 +10003,7 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
+@@ -9870,7 +10011,7 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
/* We can't use move_block_from_reg, because it will use
the wrong mode, storing D regs only. */
machine_mode mode = TImode;
@@ -1495,7 +4110,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
/* Set OFF to the offset from virtual_incoming_args_rtx of
the first vector register. The VR save area lies below
-@@ -9879,14 +10012,15 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
+@@ -9879,14 +10020,15 @@ aarch64_setup_incoming_varargs (cumulative_args_t cum_v, machine_mode mode,
STACK_BOUNDARY / BITS_PER_UNIT);
off -= vr_saved * UNITS_PER_VREG;
@@ -1513,7 +4128,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
off += UNITS_PER_VREG;
}
}
-@@ -10848,33 +10982,6 @@ aarch64_simd_emit_reg_reg_move (rtx *operands, enum machine_mode mode,
+@@ -10848,33 +10990,6 @@ aarch64_simd_emit_reg_reg_move (rtx *operands, enum machine_mode mode,
gen_rtx_REG (mode, rsrc + count - i - 1));
}
@@ -1547,7 +4162,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
/* Compute and return the length of aarch64_simd_reglist<mode>, where <mode> is
one of VSTRUCT modes: OI, CI, or XI. */
int
-@@ -11956,12 +12063,11 @@ aarch64_output_simd_mov_immediate (rtx const_vector,
+@@ -11956,12 +12071,11 @@ aarch64_output_simd_mov_immediate (rtx const_vector,
info.value = GEN_INT (0);
else
{
@@ -1561,7 +4176,136 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
if (lane_count == 1)
snprintf (templ, sizeof (templ), "fmov\t%%d0, %s", float_buf);
-@@ -13314,6 +13420,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
+@@ -12195,6 +12309,8 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
+ case V4SImode: gen = gen_aarch64_trn2v4si; break;
+ case V2SImode: gen = gen_aarch64_trn2v2si; break;
+ case V2DImode: gen = gen_aarch64_trn2v2di; break;
++ case V4HFmode: gen = gen_aarch64_trn2v4hf; break;
++ case V8HFmode: gen = gen_aarch64_trn2v8hf; break;
+ case V4SFmode: gen = gen_aarch64_trn2v4sf; break;
+ case V2SFmode: gen = gen_aarch64_trn2v2sf; break;
+ case V2DFmode: gen = gen_aarch64_trn2v2df; break;
+@@ -12213,6 +12329,8 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
+ case V4SImode: gen = gen_aarch64_trn1v4si; break;
+ case V2SImode: gen = gen_aarch64_trn1v2si; break;
+ case V2DImode: gen = gen_aarch64_trn1v2di; break;
++ case V4HFmode: gen = gen_aarch64_trn1v4hf; break;
++ case V8HFmode: gen = gen_aarch64_trn1v8hf; break;
+ case V4SFmode: gen = gen_aarch64_trn1v4sf; break;
+ case V2SFmode: gen = gen_aarch64_trn1v2sf; break;
+ case V2DFmode: gen = gen_aarch64_trn1v2df; break;
+@@ -12278,6 +12396,8 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d)
+ case V4SImode: gen = gen_aarch64_uzp2v4si; break;
+ case V2SImode: gen = gen_aarch64_uzp2v2si; break;
+ case V2DImode: gen = gen_aarch64_uzp2v2di; break;
++ case V4HFmode: gen = gen_aarch64_uzp2v4hf; break;
++ case V8HFmode: gen = gen_aarch64_uzp2v8hf; break;
+ case V4SFmode: gen = gen_aarch64_uzp2v4sf; break;
+ case V2SFmode: gen = gen_aarch64_uzp2v2sf; break;
+ case V2DFmode: gen = gen_aarch64_uzp2v2df; break;
+@@ -12296,6 +12416,8 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d)
+ case V4SImode: gen = gen_aarch64_uzp1v4si; break;
+ case V2SImode: gen = gen_aarch64_uzp1v2si; break;
+ case V2DImode: gen = gen_aarch64_uzp1v2di; break;
++ case V4HFmode: gen = gen_aarch64_uzp1v4hf; break;
++ case V8HFmode: gen = gen_aarch64_uzp1v8hf; break;
+ case V4SFmode: gen = gen_aarch64_uzp1v4sf; break;
+ case V2SFmode: gen = gen_aarch64_uzp1v2sf; break;
+ case V2DFmode: gen = gen_aarch64_uzp1v2df; break;
+@@ -12366,6 +12488,8 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
+ case V4SImode: gen = gen_aarch64_zip2v4si; break;
+ case V2SImode: gen = gen_aarch64_zip2v2si; break;
+ case V2DImode: gen = gen_aarch64_zip2v2di; break;
++ case V4HFmode: gen = gen_aarch64_zip2v4hf; break;
++ case V8HFmode: gen = gen_aarch64_zip2v8hf; break;
+ case V4SFmode: gen = gen_aarch64_zip2v4sf; break;
+ case V2SFmode: gen = gen_aarch64_zip2v2sf; break;
+ case V2DFmode: gen = gen_aarch64_zip2v2df; break;
+@@ -12384,6 +12508,8 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
+ case V4SImode: gen = gen_aarch64_zip1v4si; break;
+ case V2SImode: gen = gen_aarch64_zip1v2si; break;
+ case V2DImode: gen = gen_aarch64_zip1v2di; break;
++ case V4HFmode: gen = gen_aarch64_zip1v4hf; break;
++ case V8HFmode: gen = gen_aarch64_zip1v8hf; break;
+ case V4SFmode: gen = gen_aarch64_zip1v4sf; break;
+ case V2SFmode: gen = gen_aarch64_zip1v2sf; break;
+ case V2DFmode: gen = gen_aarch64_zip1v2df; break;
+@@ -12428,6 +12554,8 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d)
+ case V8HImode: gen = gen_aarch64_extv8hi; break;
+ case V2SImode: gen = gen_aarch64_extv2si; break;
+ case V4SImode: gen = gen_aarch64_extv4si; break;
++ case V4HFmode: gen = gen_aarch64_extv4hf; break;
++ case V8HFmode: gen = gen_aarch64_extv8hf; break;
+ case V2SFmode: gen = gen_aarch64_extv2sf; break;
+ case V4SFmode: gen = gen_aarch64_extv4sf; break;
+ case V2DImode: gen = gen_aarch64_extv2di; break;
+@@ -12503,6 +12631,8 @@ aarch64_evpc_rev (struct expand_vec_perm_d *d)
+ case V2SImode: gen = gen_aarch64_rev64v2si; break;
+ case V4SFmode: gen = gen_aarch64_rev64v4sf; break;
+ case V2SFmode: gen = gen_aarch64_rev64v2sf; break;
++ case V8HFmode: gen = gen_aarch64_rev64v8hf; break;
++ case V4HFmode: gen = gen_aarch64_rev64v4hf; break;
+ default:
+ return false;
+ }
+@@ -12746,24 +12876,6 @@ aarch64_vectorize_vec_perm_const_ok (machine_mode vmode,
+ return ret;
+ }
+
+-/* Implement target hook CANNOT_CHANGE_MODE_CLASS. */
+-bool
+-aarch64_cannot_change_mode_class (machine_mode from,
+- machine_mode to,
+- enum reg_class rclass)
+-{
+- /* We cannot allow word_mode subregs of full vector modes.
+- Otherwise the middle-end will assume it's ok to store to
+- (subreg:DI (reg:TI 100) 0) in order to modify only the low 64 bits
+- of the 128-bit register. However, after reload the subreg will
+- be dropped leaving a plain DImode store. See PR67609 for a more
+- detailed dicussion. In all other cases, we want to be permissive
+- and return false. */
+- return (reg_classes_intersect_p (FP_REGS, rclass)
+- && GET_MODE_SIZE (to) == UNITS_PER_WORD
+- && GET_MODE_SIZE (from) > UNITS_PER_WORD);
+-}
+-
+ rtx
+ aarch64_reverse_mask (enum machine_mode mode)
+ {
+@@ -12785,7 +12897,14 @@ aarch64_reverse_mask (enum machine_mode mode)
+ return force_reg (V16QImode, mask);
+ }
+
+-/* Implement MODES_TIEABLE_P. */
++/* Implement MODES_TIEABLE_P. In principle we should always return true.
++ However due to issues with register allocation it is preferable to avoid
++ tieing integer scalar and FP scalar modes. Executing integer operations
++ in general registers is better than treating them as scalar vector
++ operations. This reduces latency and avoids redundant int<->FP moves.
++ So tie modes if they are either the same class, or vector modes with
++ other vector modes, vector structs or any scalar mode.
++*/
+
+ bool
+ aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2)
+@@ -12796,9 +12915,12 @@ aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2)
+ /* We specifically want to allow elements of "structure" modes to
+ be tieable to the structure. This more general condition allows
+ other rarer situations too. */
+- if (TARGET_SIMD
+- && aarch64_vector_mode_p (mode1)
+- && aarch64_vector_mode_p (mode2))
++ if (aarch64_vector_mode_p (mode1) && aarch64_vector_mode_p (mode2))
++ return true;
++
++ /* Also allow any scalar modes with vectors. */
++ if (aarch64_vector_mode_supported_p (mode1)
++ || aarch64_vector_mode_supported_p (mode2))
+ return true;
+
+ return false;
+@@ -13314,6 +13436,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
return false;
}
@@ -1576,7 +4320,39 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
/* If MEM is in the form of [base+offset], extract the two parts
of address and set to BASE and OFFSET, otherwise return false
after clearing BASE and OFFSET. */
-@@ -13886,13 +14000,13 @@ aarch64_promoted_type (const_tree t)
+@@ -13492,6 +13622,15 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool load,
+ if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
+ return false;
+
++ /* If we have SImode and slow unaligned ldp,
++ check the alignment to be at least 8 byte. */
++ if (mode == SImode
++ && (aarch64_tune_params.extra_tuning_flags
++ & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
++ && !optimize_size
++ && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
++ return false;
++
+ /* Check if the addresses are in the form of [base+offset]. */
+ extract_base_offset_in_addr (mem_1, &base_1, &offset_1);
+ if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
+@@ -13651,6 +13790,15 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
+ return false;
+ }
+
++ /* If we have SImode and slow unaligned ldp,
++ check the alignment to be at least 8 byte. */
++ if (mode == SImode
++ && (aarch64_tune_params.extra_tuning_flags
++ & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
++ && !optimize_size
++ && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
++ return false;
++
+ if (REG_P (reg_1) && FP_REGNUM_P (REGNO (reg_1)))
+ rclass_1 = FP_REGS;
+ else
+@@ -13886,13 +14034,13 @@ aarch64_promoted_type (const_tree t)
/* Implement the TARGET_OPTAB_SUPPORTED_P hook. */
static bool
@@ -1592,7 +4368,18 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
default:
return true;
-@@ -14229,6 +14343,9 @@ aarch64_optab_supported_p (int op, machine_mode, machine_mode,
+@@ -14026,6 +14174,10 @@ aarch64_optab_supported_p (int op, machine_mode, machine_mode,
+ #undef TARGET_LEGITIMATE_CONSTANT_P
+ #define TARGET_LEGITIMATE_CONSTANT_P aarch64_legitimate_constant_p
+
++#undef TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT
++#define TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT \
++ aarch64_legitimize_address_displacement
++
+ #undef TARGET_LIBGCC_CMP_RETURN_MODE
+ #define TARGET_LIBGCC_CMP_RETURN_MODE aarch64_libgcc_cmp_return_mode
+
+@@ -14229,6 +14381,9 @@ aarch64_optab_supported_p (int op, machine_mode, machine_mode,
#undef TARGET_OPTAB_SUPPORTED_P
#define TARGET_OPTAB_SUPPORTED_P aarch64_optab_supported_p
@@ -1604,7 +4391,107 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
#include "gt-aarch64.h"
--- a/src/gcc/config/aarch64/aarch64.h
+++ b/src/gcc/config/aarch64/aarch64.h
-@@ -652,21 +652,6 @@ typedef struct
+@@ -132,9 +132,12 @@ extern unsigned aarch64_architecture_version;
+ #define AARCH64_FL_FP (1 << 1) /* Has FP. */
+ #define AARCH64_FL_CRYPTO (1 << 2) /* Has crypto. */
+ #define AARCH64_FL_CRC (1 << 3) /* Has CRC. */
+-/* ARMv8.1 architecture extensions. */
++/* ARMv8.1-A architecture extensions. */
+ #define AARCH64_FL_LSE (1 << 4) /* Has Large System Extensions. */
+-#define AARCH64_FL_V8_1 (1 << 5) /* Has ARMv8.1 extensions. */
++#define AARCH64_FL_V8_1 (1 << 5) /* Has ARMv8.1-A extensions. */
++/* ARMv8.2-A architecture extensions. */
++#define AARCH64_FL_V8_2 (1 << 8) /* Has ARMv8.2-A features. */
++#define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2-A FP16 extensions. */
+
+ /* Has FP and SIMD. */
+ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
+@@ -146,6 +149,8 @@ extern unsigned aarch64_architecture_version;
+ #define AARCH64_FL_FOR_ARCH8 (AARCH64_FL_FPSIMD)
+ #define AARCH64_FL_FOR_ARCH8_1 \
+ (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_CRC | AARCH64_FL_V8_1)
++#define AARCH64_FL_FOR_ARCH8_2 \
++ (AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_V8_2)
+
+ /* Macros to test ISA flags. */
+
+@@ -155,6 +160,8 @@ extern unsigned aarch64_architecture_version;
+ #define AARCH64_ISA_SIMD (aarch64_isa_flags & AARCH64_FL_SIMD)
+ #define AARCH64_ISA_LSE (aarch64_isa_flags & AARCH64_FL_LSE)
+ #define AARCH64_ISA_RDMA (aarch64_isa_flags & AARCH64_FL_V8_1)
++#define AARCH64_ISA_V8_2 (aarch64_isa_flags & AARCH64_FL_V8_2)
++#define AARCH64_ISA_F16 (aarch64_isa_flags & AARCH64_FL_F16)
+
+ /* Crypto is an optional extension to AdvSIMD. */
+ #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
+@@ -165,6 +172,10 @@ extern unsigned aarch64_architecture_version;
+ /* Atomic instructions that can be enabled through the +lse extension. */
+ #define TARGET_LSE (AARCH64_ISA_LSE)
+
++/* ARMv8.2-A FP16 support that can be enabled through the +fp16 extension. */
++#define TARGET_FP_F16INST (TARGET_FLOAT && AARCH64_ISA_F16)
++#define TARGET_SIMD_F16INST (TARGET_SIMD && AARCH64_ISA_F16)
++
+ /* Make sure this is always defined so we don't have to check for ifdefs
+ but rather use normal ifs. */
+ #ifndef TARGET_FIX_ERR_A53_835769_DEFAULT
+@@ -193,7 +204,7 @@ extern unsigned aarch64_architecture_version;
+ ((aarch64_fix_a53_err843419 == 2) \
+ ? TARGET_FIX_ERR_A53_843419_DEFAULT : aarch64_fix_a53_err843419)
+
+-/* ARMv8.1 Adv.SIMD support. */
++/* ARMv8.1-A Adv.SIMD support. */
+ #define TARGET_SIMD_RDMA (TARGET_SIMD && AARCH64_ISA_RDMA)
+
+ /* Standard register usage. */
+@@ -539,11 +550,14 @@ struct GTY (()) aarch64_frame
+ STACK_BOUNDARY. */
+ HOST_WIDE_INT saved_varargs_size;
+
++ /* The size of the saved callee-save int/FP registers. */
++
+ HOST_WIDE_INT saved_regs_size;
+- /* Padding if needed after the all the callee save registers have
+- been saved. */
+- HOST_WIDE_INT padding0;
+- HOST_WIDE_INT hardfp_offset; /* HARD_FRAME_POINTER_REGNUM */
++
++ /* Offset from the base of the frame (incomming SP) to the
++ top of the locals area. This value is always a multiple of
++ STACK_BOUNDARY. */
++ HOST_WIDE_INT locals_offset;
+
+ /* Offset from the base of the frame (incomming SP) to the
+ hard_frame_pointer. This value is always a multiple of
+@@ -553,12 +567,25 @@ struct GTY (()) aarch64_frame
+ /* The size of the frame. This value is the offset from base of the
+ * frame (incomming SP) to the stack_pointer. This value is always
+ * a multiple of STACK_BOUNDARY. */
++ HOST_WIDE_INT frame_size;
++
++ /* The size of the initial stack adjustment before saving callee-saves. */
++ HOST_WIDE_INT initial_adjust;
++
++ /* The writeback value when pushing callee-save registers.
++ It is zero when no push is used. */
++ HOST_WIDE_INT callee_adjust;
++
++ /* The offset from SP to the callee-save registers after initial_adjust.
++ It may be non-zero if no push is used (ie. callee_adjust == 0). */
++ HOST_WIDE_INT callee_offset;
++
++ /* The size of the stack adjustment after saving callee-saves. */
++ HOST_WIDE_INT final_adjust;
+
+ unsigned wb_candidate1;
+ unsigned wb_candidate2;
+
+- HOST_WIDE_INT frame_size;
+-
+ bool laid_out;
+ };
+
+@@ -652,21 +679,6 @@ typedef struct
#define CONSTANT_ADDRESS_P(X) aarch64_constant_address_p(X)
@@ -1626,10 +4513,27 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
#define REGNO_OK_FOR_BASE_P(REGNO) \
aarch64_regno_ok_for_base_p (REGNO, true)
-@@ -845,7 +830,7 @@ do { \
- #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
- aarch64_cannot_change_mode_class (FROM, TO, CLASS)
+@@ -722,7 +734,12 @@ do { \
+ #define USE_STORE_PRE_INCREMENT(MODE) 0
+ #define USE_STORE_PRE_DECREMENT(MODE) 0
+
+-/* ?? #define WORD_REGISTER_OPERATIONS */
++/* WORD_REGISTER_OPERATIONS does not hold for AArch64.
++ The assigned word_mode is DImode but operations narrower than SImode
++ behave as 32-bit operations if using the W-form of the registers rather
++ than as word_mode (64-bit) operations as WORD_REGISTER_OPERATIONS
++ expects. */
++#define WORD_REGISTER_OPERATIONS 0
+
+ /* Define if loading from memory in MODE, an integral mode narrower than
+ BITS_PER_WORD will either zero-extend or sign-extend. The value of this
+@@ -842,10 +859,7 @@ do { \
+ extern void __aarch64_sync_cache_range (void *, void *); \
+ __aarch64_sync_cache_range (beg, end)
+-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
+- aarch64_cannot_change_mode_class (FROM, TO, CLASS)
+-
-#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD
+#define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD)
@@ -1662,7 +4566,36 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
UNSPEC_USHL_2S
UNSPEC_VSTRUCTDUMMY
UNSPEC_SP_SET
-@@ -1178,11 +1182,12 @@
+@@ -855,13 +859,6 @@
+ || aarch64_is_noplt_call_p (callee)))
+ XEXP (operands[0], 0) = force_reg (Pmode, callee);
+
+- /* FIXME: This is a band-aid. Need to analyze why expand_expr_addr_expr
+- is generating an SImode symbol reference. See PR 64971. */
+- if (TARGET_ILP32
+- && GET_CODE (XEXP (operands[0], 0)) == SYMBOL_REF
+- && GET_MODE (XEXP (operands[0], 0)) == SImode)
+- XEXP (operands[0], 0) = convert_memory_address (Pmode,
+- XEXP (operands[0], 0));
+ if (operands[2] == NULL_RTX)
+ operands[2] = const0_rtx;
+
+@@ -893,14 +890,6 @@
+ || aarch64_is_noplt_call_p (callee)))
+ XEXP (operands[1], 0) = force_reg (Pmode, callee);
+
+- /* FIXME: This is a band-aid. Need to analyze why expand_expr_addr_expr
+- is generating an SImode symbol reference. See PR 64971. */
+- if (TARGET_ILP32
+- && GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF
+- && GET_MODE (XEXP (operands[1], 0)) == SImode)
+- XEXP (operands[1], 0) = convert_memory_address (Pmode,
+- XEXP (operands[1], 0));
+-
+ if (operands[3] == NULL_RTX)
+ operands[3] = const0_rtx;
+
+@@ -1178,11 +1167,12 @@
)
(define_insn "*movhf_aarch64"
@@ -1677,7 +4610,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
mov\\t%0.h[0], %w1
umov\\t%w0, %1.h[0]
mov\\t%0.h[0], %1.h[0]
-@@ -1191,18 +1196,18 @@
+@@ -1191,18 +1181,18 @@
ldrh\\t%w0, %1
strh\\t%w1, %0
mov\\t%w0, %w1"
@@ -1701,7 +4634,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
fmov\\t%s0, %w1
fmov\\t%w0, %s1
fmov\\t%s0, %s1
-@@ -1212,16 +1217,18 @@
+@@ -1212,16 +1202,18 @@
ldr\\t%w0, %1
str\\t%w1, %0
mov\\t%w0, %w1"
@@ -1724,7 +4657,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
fmov\\t%d0, %x1
fmov\\t%x0, %d1
fmov\\t%d0, %d1
-@@ -1231,8 +1238,9 @@
+@@ -1231,8 +1223,9 @@
ldr\\t%x0, %1
str\\t%x1, %0
mov\\t%x0, %x1"
@@ -1736,7 +4669,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
(define_insn "*movtf_aarch64"
-@@ -1257,7 +1265,6 @@
+@@ -1257,7 +1250,6 @@
[(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\
f_loadd,f_stored,load2,store2,store2")
(set_attr "length" "4,8,8,8,4,4,4,4,4,4,4")
@@ -1744,7 +4677,51 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
(set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*,*")]
)
-@@ -1783,7 +1790,7 @@
+@@ -1570,10 +1562,10 @@
+ (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" "r,m,m")))]
+ ""
+ "@
+- uxt<SHORT:size>\t%<GPI:w>0, %w1
++ and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask>
+ ldr<SHORT:size>\t%w0, %1
+ ldr\t%<SHORT:size>0, %1"
+- [(set_attr "type" "extend,load1,load1")]
++ [(set_attr "type" "logic_imm,load1,load1")]
+ )
+
+ (define_expand "<optab>qihi2"
+@@ -1582,16 +1574,26 @@
+ ""
+ )
+
+-(define_insn "*<optab>qihi2_aarch64"
++(define_insn "*extendqihi2_aarch64"
+ [(set (match_operand:HI 0 "register_operand" "=r,r")
+- (ANY_EXTEND:HI (match_operand:QI 1 "nonimmediate_operand" "r,m")))]
++ (sign_extend:HI (match_operand:QI 1 "nonimmediate_operand" "r,m")))]
+ ""
+ "@
+- <su>xtb\t%w0, %w1
+- <ldrxt>b\t%w0, %1"
++ sxtb\t%w0, %w1
++ ldrsb\t%w0, %1"
+ [(set_attr "type" "extend,load1")]
+ )
+
++(define_insn "*zero_extendqihi2_aarch64"
++ [(set (match_operand:HI 0 "register_operand" "=r,r")
++ (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" "r,m")))]
++ ""
++ "@
++ and\t%w0, %w1, 255
++ ldrb\t%w0, %1"
++ [(set_attr "type" "logic_imm,load1")]
++)
++
+ ;; -------------------------------------------------------------------
+ ;; Simple arithmetic
+ ;; -------------------------------------------------------------------
+@@ -1783,7 +1785,7 @@
"aarch64_zero_extend_const_eq (<DWI>mode, operands[2],
<MODE>mode, operands[1])"
"@
@@ -1753,7 +4730,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
cmp\\t%<w>0, #%n1"
[(set_attr "type" "alus_imm")]
)
-@@ -1815,11 +1822,11 @@
+@@ -1815,11 +1817,11 @@
"aarch64_zero_extend_const_eq (<DWI>mode, operands[3],
<MODE>mode, operands[2])"
"@
@@ -1767,7 +4744,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
(define_insn "add<mode>3_compareC"
[(set (reg:CC_C CC_REGNUM)
(ne:CC_C
-@@ -3422,7 +3429,9 @@
+@@ -3422,7 +3424,9 @@
(LOGICAL:SI (match_operand:SI 1 "register_operand" "%r,r")
(match_operand:SI 2 "aarch64_logical_operand" "r,K"))))]
""
@@ -1778,7 +4755,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
[(set_attr "type" "logic_reg,logic_imm")]
)
-@@ -3435,7 +3444,9 @@
+@@ -3435,7 +3439,9 @@
(set (match_operand:GPI 0 "register_operand" "=r,r")
(and:GPI (match_dup 1) (match_dup 2)))]
""
@@ -1789,7 +4766,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
[(set_attr "type" "logics_reg,logics_imm")]
)
-@@ -3449,7 +3460,9 @@
+@@ -3449,7 +3455,9 @@
(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI (and:SI (match_dup 1) (match_dup 2))))]
""
@@ -1800,7 +4777,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
[(set_attr "type" "logics_reg,logics_imm")]
)
-@@ -3775,16 +3788,23 @@
+@@ -3775,16 +3783,23 @@
[(set_attr "type" "rbit")]
)
@@ -1833,7 +4810,26 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
(define_insn "*and<mode>_compare0"
[(set (reg:CC_NZ CC_REGNUM)
-@@ -3803,7 +3823,9 @@
+@@ -3796,6 +3811,18 @@
+ [(set_attr "type" "alus_imm")]
+ )
+
++(define_insn "*ands<mode>_compare0"
++ [(set (reg:CC_NZ CC_REGNUM)
++ (compare:CC_NZ
++ (zero_extend:GPI (match_operand:SHORT 1 "register_operand" "r"))
++ (const_int 0)))
++ (set (match_operand:GPI 0 "register_operand" "=r")
++ (zero_extend:GPI (match_dup 1)))]
++ ""
++ "ands\\t%<GPI:w>0, %<GPI:w>1, <short_mask>"
++ [(set_attr "type" "alus_imm")]
++)
++
+ (define_insn "*and<mode>3nr_compare0"
+ [(set (reg:CC_NZ CC_REGNUM)
+ (compare:CC_NZ
+@@ -3803,7 +3830,9 @@
(match_operand:GPI 1 "aarch64_logical_operand" "r,<lconst>"))
(const_int 0)))]
""
@@ -1844,7 +4840,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
[(set_attr "type" "logics_reg,logics_imm")]
)
-@@ -3869,22 +3891,16 @@
+@@ -3869,22 +3898,16 @@
(define_expand "ashl<mode>3"
[(set (match_operand:SHORT 0 "register_operand")
(ashift:SHORT (match_operand:SHORT 1 "register_operand")
@@ -1873,7 +4869,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
}
)
-@@ -3933,33 +3949,35 @@
+@@ -3933,33 +3956,35 @@
;; Logical left shift using SISD or Integer instruction
(define_insn "*aarch64_ashl_sisd_or_int_<mode>3"
@@ -1921,7 +4917,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
(define_split
-@@ -3994,18 +4012,19 @@
+@@ -3994,18 +4019,19 @@
;; Arithmetic right shift using SISD or Integer instruction
(define_insn "*aarch64_ashr_sisd_or_int_<mode>3"
@@ -1946,7 +4942,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
(define_split
-@@ -4097,21 +4116,25 @@
+@@ -4097,21 +4123,25 @@
[(set (match_operand:GPI 0 "register_operand" "=r,r")
(rotatert:GPI
(match_operand:GPI 1 "register_operand" "r,r")
@@ -1980,7 +4976,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
(define_insn "*<optab><mode>3_insn"
-@@ -4135,7 +4158,7 @@
+@@ -4135,7 +4165,7 @@
"UINTVAL (operands[3]) < GET_MODE_BITSIZE (<MODE>mode) &&
(UINTVAL (operands[3]) + UINTVAL (operands[4]) == GET_MODE_BITSIZE (<MODE>mode))"
"extr\\t%<w>0, %<w>1, %<w>2, %4"
@@ -1989,7 +4985,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
;; There are no canonicalisation rules for ashift and lshiftrt inside an ior
-@@ -4150,7 +4173,7 @@
+@@ -4150,7 +4180,7 @@
&& (UINTVAL (operands[3]) + UINTVAL (operands[4])
== GET_MODE_BITSIZE (<MODE>mode))"
"extr\\t%<w>0, %<w>1, %<w>2, %4"
@@ -1998,7 +4994,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
;; zero_extend version of the above
-@@ -4164,7 +4187,7 @@
+@@ -4164,7 +4194,7 @@
"UINTVAL (operands[3]) < 32 &&
(UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32)"
"extr\\t%w0, %w1, %w2, %4"
@@ -2007,7 +5003,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
(define_insn "*extrsi5_insn_uxtw_alt"
-@@ -4177,7 +4200,7 @@
+@@ -4177,7 +4207,7 @@
"UINTVAL (operands[3]) < 32 &&
(UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32)"
"extr\\t%w0, %w1, %w2, %4"
@@ -2016,10 +5012,118 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
)
(define_insn "*ror<mode>3_insn"
-@@ -4608,6 +4631,36 @@
+@@ -4357,9 +4387,7 @@
+ (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand" "r")
+ (match_operand 2 "const_int_operand" "n"))
+ (match_operand 3 "const_int_operand" "n")))]
+- "(INTVAL (operands[2]) < (<GPI:sizen>))
+- && exact_log2 ((INTVAL (operands[3]) >> INTVAL (operands[2])) + 1) >= 0
+- && (INTVAL (operands[3]) & ((1 << INTVAL (operands[2])) - 1)) == 0"
++ "aarch64_mask_and_shift_for_ubfiz_p (<MODE>mode, operands[3], operands[2])"
+ "ubfiz\\t%<w>0, %<w>1, %2, %P3"
+ [(set_attr "type" "bfm")]
+ )
+@@ -4429,22 +4457,23 @@
+ ;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
+
+ (define_insn "<frint_pattern><mode>2"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
+ FRINT))]
+ "TARGET_FLOAT"
+ "frint<frint_suffix>\\t%<s>0, %<s>1"
+- [(set_attr "type" "f_rint<s>")]
++ [(set_attr "type" "f_rint<stype>")]
+ )
+
+ ;; frcvt floating-point round to integer and convert standard patterns.
+ ;; Expands to lbtrunc, lceil, lfloor, lround.
+-(define_insn "l<fcvt_pattern><su_optab><GPF:mode><GPI:mode>2"
++(define_insn "l<fcvt_pattern><su_optab><GPF_F16:mode><GPI:mode>2"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+- (FIXUORS:GPI (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")]
+- FCVT)))]
++ (FIXUORS:GPI
++ (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")]
++ FCVT)))]
+ "TARGET_FLOAT"
+- "fcvt<frint_suffix><su>\\t%<GPI:w>0, %<GPF:s>1"
++ "fcvt<frint_suffix><su>\\t%<GPI:w>0, %<GPF_F16:s>1"
+ [(set_attr "type" "f_cvtf2i")]
+ )
+
+@@ -4470,23 +4499,24 @@
+ ;; fma - no throw
+
+ (define_insn "fma<mode>4"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (fma:GPF (match_operand:GPF 1 "register_operand" "w")
+- (match_operand:GPF 2 "register_operand" "w")
+- (match_operand:GPF 3 "register_operand" "w")))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (fma:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")
++ (match_operand:GPF_F16 2 "register_operand" "w")
++ (match_operand:GPF_F16 3 "register_operand" "w")))]
+ "TARGET_FLOAT"
+ "fmadd\\t%<s>0, %<s>1, %<s>2, %<s>3"
+- [(set_attr "type" "fmac<s>")]
++ [(set_attr "type" "fmac<stype>")]
+ )
+
+ (define_insn "fnma<mode>4"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w"))
+- (match_operand:GPF 2 "register_operand" "w")
+- (match_operand:GPF 3 "register_operand" "w")))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (fma:GPF_F16
++ (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w"))
++ (match_operand:GPF_F16 2 "register_operand" "w")
++ (match_operand:GPF_F16 3 "register_operand" "w")))]
+ "TARGET_FLOAT"
+ "fmsub\\t%<s>0, %<s>1, %<s>2, %<s>3"
+- [(set_attr "type" "fmac<s>")]
++ [(set_attr "type" "fmac<stype>")]
+ )
+
+ (define_insn "fms<mode>4"
+@@ -4572,19 +4602,11 @@
+ [(set_attr "type" "f_cvt")]
+ )
+
+-(define_insn "fix_trunc<GPF:mode><GPI:mode>2"
++(define_insn "<optab>_trunc<GPF_F16:mode><GPI:mode>2"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+- (fix:GPI (match_operand:GPF 1 "register_operand" "w")))]
++ (FIXUORS:GPI (match_operand:GPF_F16 1 "register_operand" "w")))]
+ "TARGET_FLOAT"
+- "fcvtzs\\t%<GPI:w>0, %<GPF:s>1"
+- [(set_attr "type" "f_cvtf2i")]
+-)
+-
+-(define_insn "fixuns_trunc<GPF:mode><GPI:mode>2"
+- [(set (match_operand:GPI 0 "register_operand" "=r")
+- (unsigned_fix:GPI (match_operand:GPF 1 "register_operand" "w")))]
+- "TARGET_FLOAT"
+- "fcvtzu\\t%<GPI:w>0, %<GPF:s>1"
++ "fcvtz<su>\t%<GPI:w>0, %<GPF_F16:s>1"
+ [(set_attr "type" "f_cvtf2i")]
+ )
+
+@@ -4608,38 +4630,116 @@
[(set_attr "type" "f_cvti2f")]
)
++(define_insn "<optab><mode>hf2"
++ [(set (match_operand:HF 0 "register_operand" "=w")
++ (FLOATUORS:HF (match_operand:GPI 1 "register_operand" "r")))]
++ "TARGET_FP_F16INST"
++ "<su_optab>cvtf\t%h0, %<w>1"
++ [(set_attr "type" "f_cvti2f")]
++)
++
+;; Convert between fixed-point and floating-point (scalar modes)
+
+(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn><GPF:mode>3"
@@ -2050,44 +5154,144 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
+ (set_attr "simd" "*, yes")]
+)
+
- ;; -------------------------------------------------------------------
- ;; Floating-point arithmetic
- ;; -------------------------------------------------------------------
-@@ -4662,11 +4715,22 @@
- [(set_attr "type" "fmul<s>")]
- )
-
--(define_insn "div<mode>3"
-+(define_expand "div<mode>3"
-+ [(set (match_operand:GPF 0 "register_operand")
-+ (div:GPF (match_operand:GPF 1 "general_operand")
-+ (match_operand:GPF 2 "register_operand")))]
-+ "TARGET_SIMD"
-+{
-+ if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
++(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn>hf<mode>3"
++ [(set (match_operand:GPI 0 "register_operand" "=r")
++ (unspec:GPI [(match_operand:HF 1 "register_operand" "w")
++ (match_operand:SI 2 "immediate_operand" "i")]
++ FCVT_F2FIXED))]
++ "TARGET_FP_F16INST"
++ "<FCVT_F2FIXED:fcvt_fixed_insn>\t%<GPI:w>0, %h1, #%2"
++ [(set_attr "type" "f_cvtf2i")]
++)
++
++(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn><mode>hf3"
++ [(set (match_operand:HF 0 "register_operand" "=w")
++ (unspec:HF [(match_operand:GPI 1 "register_operand" "r")
++ (match_operand:SI 2 "immediate_operand" "i")]
++ FCVT_FIXED2F))]
++ "TARGET_FP_F16INST"
++ "<FCVT_FIXED2F:fcvt_fixed_insn>\t%h0, %<GPI:w>1, #%2"
++ [(set_attr "type" "f_cvti2f")]
++)
++
++(define_insn "<FCVT_F2FIXED:fcvt_fixed_insn>hf3"
++ [(set (match_operand:HI 0 "register_operand" "=w")
++ (unspec:HI [(match_operand:HF 1 "register_operand" "w")
++ (match_operand:SI 2 "immediate_operand" "i")]
++ FCVT_F2FIXED))]
++ "TARGET_SIMD"
++ "<FCVT_F2FIXED:fcvt_fixed_insn>\t%h0, %h1, #%2"
++ [(set_attr "type" "neon_fp_to_int_s")]
++)
++
++(define_insn "<FCVT_FIXED2F:fcvt_fixed_insn>hi3"
++ [(set (match_operand:HF 0 "register_operand" "=w")
++ (unspec:HF [(match_operand:HI 1 "register_operand" "w")
++ (match_operand:SI 2 "immediate_operand" "i")]
++ FCVT_FIXED2F))]
++ "TARGET_SIMD"
++ "<FCVT_FIXED2F:fcvt_fixed_insn>\t%h0, %h1, #%2"
++ [(set_attr "type" "neon_int_to_fp_s")]
++)
++
+ ;; -------------------------------------------------------------------
+ ;; Floating-point arithmetic
+ ;; -------------------------------------------------------------------
+
+ (define_insn "add<mode>3"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (plus:GPF
+- (match_operand:GPF 1 "register_operand" "w")
+- (match_operand:GPF 2 "register_operand" "w")))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (plus:GPF_F16
++ (match_operand:GPF_F16 1 "register_operand" "w")
++ (match_operand:GPF_F16 2 "register_operand" "w")))]
+ "TARGET_FLOAT"
+ "fadd\\t%<s>0, %<s>1, %<s>2"
+- [(set_attr "type" "fadd<s>")]
++ [(set_attr "type" "fadd<stype>")]
+ )
+
+ (define_insn "sub<mode>3"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (minus:GPF
+- (match_operand:GPF 1 "register_operand" "w")
+- (match_operand:GPF 2 "register_operand" "w")))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (minus:GPF_F16
++ (match_operand:GPF_F16 1 "register_operand" "w")
++ (match_operand:GPF_F16 2 "register_operand" "w")))]
+ "TARGET_FLOAT"
+ "fsub\\t%<s>0, %<s>1, %<s>2"
+- [(set_attr "type" "fadd<s>")]
++ [(set_attr "type" "fadd<stype>")]
+ )
+
+ (define_insn "mul<mode>3"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (mult:GPF
+- (match_operand:GPF 1 "register_operand" "w")
+- (match_operand:GPF 2 "register_operand" "w")))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (mult:GPF_F16
++ (match_operand:GPF_F16 1 "register_operand" "w")
++ (match_operand:GPF_F16 2 "register_operand" "w")))]
+ "TARGET_FLOAT"
+ "fmul\\t%<s>0, %<s>1, %<s>2"
+- [(set_attr "type" "fmul<s>")]
++ [(set_attr "type" "fmul<stype>")]
+ )
+
+ (define_insn "*fnmul<mode>3"
+@@ -4662,38 +4762,58 @@
+ [(set_attr "type" "fmul<s>")]
+ )
+
+-(define_insn "div<mode>3"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (div:GPF
+- (match_operand:GPF 1 "register_operand" "w")
+- (match_operand:GPF 2 "register_operand" "w")))]
++(define_expand "div<mode>3"
++ [(set (match_operand:GPF_F16 0 "register_operand")
++ (div:GPF_F16 (match_operand:GPF_F16 1 "general_operand")
++ (match_operand:GPF_F16 2 "register_operand")))]
++ "TARGET_SIMD"
++{
++ if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
+ DONE;
+
+ operands[1] = force_reg (<MODE>mode, operands[1]);
+})
+
+(define_insn "*div<mode>3"
- [(set (match_operand:GPF 0 "register_operand" "=w")
-- (div:GPF
-- (match_operand:GPF 1 "register_operand" "w")
-- (match_operand:GPF 2 "register_operand" "w")))]
-+ (div:GPF (match_operand:GPF 1 "register_operand" "w")
-+ (match_operand:GPF 2 "register_operand" "w")))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (div:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")
++ (match_operand:GPF_F16 2 "register_operand" "w")))]
"TARGET_FLOAT"
"fdiv\\t%<s>0, %<s>1, %<s>2"
- [(set_attr "type" "fdiv<s>")]
-@@ -4680,7 +4744,16 @@
- [(set_attr "type" "ffarith<s>")]
+- [(set_attr "type" "fdiv<s>")]
++ [(set_attr "type" "fdiv<stype>")]
+ )
+
+ (define_insn "neg<mode>2"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (neg:GPF (match_operand:GPF 1 "register_operand" "w")))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
+ "TARGET_FLOAT"
+ "fneg\\t%<s>0, %<s>1"
+- [(set_attr "type" "ffarith<s>")]
++ [(set_attr "type" "ffarith<stype>")]
)
-(define_insn "sqrt<mode>2"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (sqrt:GPF (match_operand:GPF 1 "register_operand" "w")))]
+(define_expand "sqrt<mode>2"
-+ [(set (match_operand:GPF 0 "register_operand")
-+ (sqrt:GPF (match_operand:GPF 1 "register_operand")))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (sqrt:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
+ "TARGET_FLOAT"
+{
+ if (aarch64_emit_approx_sqrt (operands[0], operands[1], false))
@@ -2095,10 +5299,53 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
+})
+
+(define_insn "*sqrt<mode>2"
- [(set (match_operand:GPF 0 "register_operand" "=w")
- (sqrt:GPF (match_operand:GPF 1 "register_operand" "w")))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (sqrt:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
+ "TARGET_FLOAT"
+ "fsqrt\\t%<s>0, %<s>1"
+- [(set_attr "type" "fsqrt<s>")]
++ [(set_attr "type" "fsqrt<stype>")]
+ )
+
+ (define_insn "abs<mode>2"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (abs:GPF (match_operand:GPF 1 "register_operand" "w")))]
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (abs:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")))]
+ "TARGET_FLOAT"
+ "fabs\\t%<s>0, %<s>1"
+- [(set_attr "type" "ffarith<s>")]
++ [(set_attr "type" "ffarith<stype>")]
+ )
+
+ ;; Given that smax/smin do not specify the result when either input is NaN,
+@@ -4718,15 +4838,17 @@
+ [(set_attr "type" "f_minmax<s>")]
+ )
+
+-;; Scalar forms for the IEEE-754 fmax()/fmin() functions
+-(define_insn "<fmaxmin><mode>3"
+- [(set (match_operand:GPF 0 "register_operand" "=w")
+- (unspec:GPF [(match_operand:GPF 1 "register_operand" "w")
+- (match_operand:GPF 2 "register_operand" "w")]
+- FMAXMIN))]
++;; Scalar forms for fmax, fmin, fmaxnm, fminnm.
++;; fmaxnm and fminnm are used for the fmax<mode>3 standard pattern names,
++;; which implement the IEEE fmax ()/fmin () functions.
++(define_insn "<maxmin_uns><mode>3"
++ [(set (match_operand:GPF_F16 0 "register_operand" "=w")
++ (unspec:GPF_F16 [(match_operand:GPF_F16 1 "register_operand" "w")
++ (match_operand:GPF_F16 2 "register_operand" "w")]
++ FMAXMIN_UNS))]
"TARGET_FLOAT"
-@@ -5191,7 +5264,7 @@
+- "<fmaxmin_op>\\t%<s>0, %<s>1, %<s>2"
+- [(set_attr "type" "f_minmax<s>")]
++ "<maxmin_uns_op>\\t%<s>0, %<s>1, %<s>2"
++ [(set_attr "type" "f_minmax<stype>")]
+ )
+
+ ;; For copysign (x, y), we want to generate:
+@@ -5191,7 +5313,7 @@
UNSPEC_SP_TEST))
(clobber (match_scratch:PTR 3 "=&r"))]
""
@@ -2109,7 +5356,13 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
--- a/src/gcc/config/aarch64/aarch64.opt
+++ b/src/gcc/config/aarch64/aarch64.opt
-@@ -151,5 +151,19 @@ PC relative literal loads.
+@@ -146,10 +146,24 @@ EnumValue
+ Enum(aarch64_abi) String(lp64) Value(AARCH64_ABI_LP64)
+
+ mpc-relative-literal-loads
+-Target Report Save Var(nopcrelative_literal_loads) Init(2) Save
++Target Report Save Var(pcrelative_literal_loads) Init(2) Save
+ PC relative literal loads.
mlow-precision-recip-sqrt
Common Var(flag_mrecip_low_precision_sqrt) Optimization
@@ -2131,7087 +5384,122043 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
+Enable the division approximation. Enabling this reduces
+precision of division results to about 16 bits for
+single precision and to 32 bits for double precision.
---- a/src/gcc/config/aarch64/arm_neon.h
-+++ b/src/gcc/config/aarch64/arm_neon.h
-@@ -5440,17 +5440,6 @@ vabaq_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t c)
- return result;
- }
-
--__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
--vabd_f32 (float32x2_t a, float32x2_t b)
--{
-- float32x2_t result;
-- __asm__ ("fabd %0.2s, %1.2s, %2.2s"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
- __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
- vabd_s8 (int8x8_t a, int8x8_t b)
- {
-@@ -5517,17 +5506,6 @@ vabd_u32 (uint32x2_t a, uint32x2_t b)
- return result;
- }
-
--__extension__ static __inline float64_t __attribute__ ((__always_inline__))
--vabdd_f64 (float64_t a, float64_t b)
--{
-- float64_t result;
-- __asm__ ("fabd %d0, %d1, %d2"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
- __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
- vabdl_high_s8 (int8x16_t a, int8x16_t b)
- {
-@@ -5660,28 +5638,6 @@ vabdl_u32 (uint32x2_t a, uint32x2_t b)
- return result;
- }
-
--__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
--vabdq_f32 (float32x4_t a, float32x4_t b)
--{
-- float32x4_t result;
-- __asm__ ("fabd %0.4s, %1.4s, %2.4s"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
--vabdq_f64 (float64x2_t a, float64x2_t b)
--{
-- float64x2_t result;
-- __asm__ ("fabd %0.2d, %1.2d, %2.2d"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
- __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
- vabdq_s8 (int8x16_t a, int8x16_t b)
- {
-@@ -5748,17 +5704,6 @@ vabdq_u32 (uint32x4_t a, uint32x4_t b)
- return result;
- }
-
--__extension__ static __inline float32_t __attribute__ ((__always_inline__))
--vabds_f32 (float32_t a, float32_t b)
--{
-- float32_t result;
-- __asm__ ("fabd %s0, %s1, %s2"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
- __extension__ static __inline int16_t __attribute__ ((__always_inline__))
- vaddlv_s8 (int8x8_t a)
- {
-@@ -6025,246 +5970,6 @@ vaddlvq_u32 (uint32x4_t a)
- result; \
- })
-
--#define vcvt_n_f32_s32(a, b) \
-- __extension__ \
-- ({ \
-- int32x2_t a_ = (a); \
-- float32x2_t result; \
-- __asm__ ("scvtf %0.2s, %1.2s, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
--
--#define vcvt_n_f32_u32(a, b) \
-- __extension__ \
-- ({ \
-- uint32x2_t a_ = (a); \
-- float32x2_t result; \
-- __asm__ ("ucvtf %0.2s, %1.2s, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
--
--#define vcvt_n_s32_f32(a, b) \
-- __extension__ \
-- ({ \
-- float32x2_t a_ = (a); \
-- int32x2_t result; \
-- __asm__ ("fcvtzs %0.2s, %1.2s, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
--
--#define vcvt_n_u32_f32(a, b) \
-- __extension__ \
-- ({ \
-- float32x2_t a_ = (a); \
-- uint32x2_t result; \
-- __asm__ ("fcvtzu %0.2s, %1.2s, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
--
--#define vcvtd_n_f64_s64(a, b) \
-- __extension__ \
-- ({ \
-- int64_t a_ = (a); \
-- float64_t result; \
-- __asm__ ("scvtf %d0,%d1,%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
--
--#define vcvtd_n_f64_u64(a, b) \
-- __extension__ \
-- ({ \
-- uint64_t a_ = (a); \
-- float64_t result; \
-- __asm__ ("ucvtf %d0,%d1,%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
--
--#define vcvtd_n_s64_f64(a, b) \
-- __extension__ \
-- ({ \
-- float64_t a_ = (a); \
-- int64_t result; \
-- __asm__ ("fcvtzs %d0,%d1,%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
--
--#define vcvtd_n_u64_f64(a, b) \
-- __extension__ \
-- ({ \
-- float64_t a_ = (a); \
-- uint64_t result; \
-- __asm__ ("fcvtzu %d0,%d1,%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
--
--#define vcvtq_n_f32_s32(a, b) \
-- __extension__ \
-- ({ \
-- int32x4_t a_ = (a); \
-- float32x4_t result; \
-- __asm__ ("scvtf %0.4s, %1.4s, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
+--- /dev/null
++++ b/src/gcc/config/aarch64/arm_fp16.h
+@@ -0,0 +1,579 @@
++/* ARM FP16 scalar intrinsics include file.
++
++ Copyright (C) 2016 Free Software Foundation, Inc.
++ Contributed by ARM Ltd.
++
++ This file is part of GCC.
++
++ GCC is free software; you can redistribute it and/or modify it
++ under the terms of the GNU General Public License as published
++ by the Free Software Foundation; either version 3, or (at your
++ option) any later version.
++
++ GCC is distributed in the hope that it will be useful, but WITHOUT
++ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
++ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
++ License for more details.
++
++ Under Section 7 of GPL version 3, you are granted additional
++ permissions described in the GCC Runtime Library Exception, version
++ 3.1, as published by the Free Software Foundation.
++
++ You should have received a copy of the GNU General Public License and
++ a copy of the GCC Runtime Library Exception along with this program;
++ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
++ <http://www.gnu.org/licenses/>. */
++
++#ifndef _AARCH64_FP16_H_
++#define _AARCH64_FP16_H_
++
++#include <stdint.h>
++
++#pragma GCC push_options
++#pragma GCC target ("arch=armv8.2-a+fp16")
++
++typedef __fp16 float16_t;
++
++/* ARMv8.2-A FP16 one operand scalar intrinsics. */
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vabsh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_abshf (__a);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vceqzh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_cmeqhf_uss (__a, 0.0f);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcgezh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_cmgehf_uss (__a, 0.0f);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcgtzh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_cmgthf_uss (__a, 0.0f);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vclezh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_cmlehf_uss (__a, 0.0f);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcltzh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_cmlthf_uss (__a, 0.0f);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_f16_s16 (int16_t __a)
++{
++ return __builtin_aarch64_floathihf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_f16_s32 (int32_t __a)
++{
++ return __builtin_aarch64_floatsihf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_f16_s64 (int64_t __a)
++{
++ return __builtin_aarch64_floatdihf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_f16_u16 (uint16_t __a)
++{
++ return __builtin_aarch64_floatunshihf_us (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_f16_u32 (uint32_t __a)
++{
++ return __builtin_aarch64_floatunssihf_us (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_f16_u64 (uint64_t __a)
++{
++ return __builtin_aarch64_floatunsdihf_us (__a);
++}
++
++__extension__ static __inline int16_t __attribute__ ((__always_inline__))
++vcvth_s16_f16 (float16_t __a)
++{
++ return __builtin_aarch64_fix_trunchfhi (__a);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvth_s32_f16 (float16_t __a)
++{
++ return __builtin_aarch64_fix_trunchfsi (__a);
++}
++
++__extension__ static __inline int64_t __attribute__ ((__always_inline__))
++vcvth_s64_f16 (float16_t __a)
++{
++ return __builtin_aarch64_fix_trunchfdi (__a);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcvth_u16_f16 (float16_t __a)
++{
++ return __builtin_aarch64_fixuns_trunchfhi_us (__a);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvth_u32_f16 (float16_t __a)
++{
++ return __builtin_aarch64_fixuns_trunchfsi_us (__a);
++}
++
++__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
++vcvth_u64_f16 (float16_t __a)
++{
++ return __builtin_aarch64_fixuns_trunchfdi_us (__a);
++}
++
++__extension__ static __inline int16_t __attribute__ ((__always_inline__))
++vcvtah_s16_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lroundhfhi (__a);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvtah_s32_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lroundhfsi (__a);
++}
++
++__extension__ static __inline int64_t __attribute__ ((__always_inline__))
++vcvtah_s64_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lroundhfdi (__a);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcvtah_u16_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lrounduhfhi_us (__a);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvtah_u32_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lrounduhfsi_us (__a);
++}
++
++__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
++vcvtah_u64_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lrounduhfdi_us (__a);
++}
++
++__extension__ static __inline int16_t __attribute__ ((__always_inline__))
++vcvtmh_s16_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lfloorhfhi (__a);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvtmh_s32_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lfloorhfsi (__a);
++}
++
++__extension__ static __inline int64_t __attribute__ ((__always_inline__))
++vcvtmh_s64_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lfloorhfdi (__a);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcvtmh_u16_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lflooruhfhi_us (__a);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvtmh_u32_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lflooruhfsi_us (__a);
++}
++
++__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
++vcvtmh_u64_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lflooruhfdi_us (__a);
++}
++
++__extension__ static __inline int16_t __attribute__ ((__always_inline__))
++vcvtnh_s16_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lfrintnhfhi (__a);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvtnh_s32_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lfrintnhfsi (__a);
++}
++
++__extension__ static __inline int64_t __attribute__ ((__always_inline__))
++vcvtnh_s64_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lfrintnhfdi (__a);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcvtnh_u16_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lfrintnuhfhi_us (__a);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvtnh_u32_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lfrintnuhfsi_us (__a);
++}
++
++__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
++vcvtnh_u64_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lfrintnuhfdi_us (__a);
++}
++
++__extension__ static __inline int16_t __attribute__ ((__always_inline__))
++vcvtph_s16_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lceilhfhi (__a);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvtph_s32_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lceilhfsi (__a);
++}
++
++__extension__ static __inline int64_t __attribute__ ((__always_inline__))
++vcvtph_s64_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lceilhfdi (__a);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcvtph_u16_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lceiluhfhi_us (__a);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvtph_u32_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lceiluhfsi_us (__a);
++}
++
++__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
++vcvtph_u64_f16 (float16_t __a)
++{
++ return __builtin_aarch64_lceiluhfdi_us (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vnegh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_neghf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrecpeh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_frecpehf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrecpxh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_frecpxhf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_btrunchf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndah_f16 (float16_t __a)
++{
++ return __builtin_aarch64_roundhf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndih_f16 (float16_t __a)
++{
++ return __builtin_aarch64_nearbyinthf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndmh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_floorhf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndnh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_frintnhf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndph_f16 (float16_t __a)
++{
++ return __builtin_aarch64_ceilhf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndxh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_rinthf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrsqrteh_f16 (float16_t __a)
++{
++ return __builtin_aarch64_rsqrtehf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vsqrth_f16 (float16_t __a)
++{
++ return __builtin_aarch64_sqrthf (__a);
++}
++
++/* ARMv8.2-A FP16 two operands scalar intrinsics. */
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vaddh_f16 (float16_t __a, float16_t __b)
++{
++ return __a + __b;
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vabdh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_fabdhf (__a, __b);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcageh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_facgehf_uss (__a, __b);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcagth_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_facgthf_uss (__a, __b);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcaleh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_faclehf_uss (__a, __b);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcalth_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_faclthf_uss (__a, __b);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vceqh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_cmeqhf_uss (__a, __b);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcgeh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_cmgehf_uss (__a, __b);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcgth_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_cmgthf_uss (__a, __b);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcleh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_cmlehf_uss (__a, __b);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vclth_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_cmlthf_uss (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_n_f16_s16 (int16_t __a, const int __b)
++{
++ return __builtin_aarch64_scvtfhi (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_n_f16_s32 (int32_t __a, const int __b)
++{
++ return __builtin_aarch64_scvtfsihf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_n_f16_s64 (int64_t __a, const int __b)
++{
++ return __builtin_aarch64_scvtfdihf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_n_f16_u16 (uint16_t __a, const int __b)
++{
++ return __builtin_aarch64_ucvtfhi_sus (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_n_f16_u32 (uint32_t __a, const int __b)
++{
++ return __builtin_aarch64_ucvtfsihf_sus (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_n_f16_u64 (uint64_t __a, const int __b)
++{
++ return __builtin_aarch64_ucvtfdihf_sus (__a, __b);
++}
++
++__extension__ static __inline int16_t __attribute__ ((__always_inline__))
++vcvth_n_s16_f16 (float16_t __a, const int __b)
++{
++ return __builtin_aarch64_fcvtzshf (__a, __b);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvth_n_s32_f16 (float16_t __a, const int __b)
++{
++ return __builtin_aarch64_fcvtzshfsi (__a, __b);
++}
++
++__extension__ static __inline int64_t __attribute__ ((__always_inline__))
++vcvth_n_s64_f16 (float16_t __a, const int __b)
++{
++ return __builtin_aarch64_fcvtzshfdi (__a, __b);
++}
++
++__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++vcvth_n_u16_f16 (float16_t __a, const int __b)
++{
++ return __builtin_aarch64_fcvtzuhf_uss (__a, __b);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvth_n_u32_f16 (float16_t __a, const int __b)
++{
++ return __builtin_aarch64_fcvtzuhfsi_uss (__a, __b);
++}
++
++__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
++vcvth_n_u64_f16 (float16_t __a, const int __b)
++{
++ return __builtin_aarch64_fcvtzuhfdi_uss (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vdivh_f16 (float16_t __a, float16_t __b)
++{
++ return __a / __b;
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vmaxh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_fmaxhf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vmaxnmh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_fmaxhf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vminh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_fminhf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vminnmh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_fminhf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vmulh_f16 (float16_t __a, float16_t __b)
++{
++ return __a * __b;
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vmulxh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_fmulxhf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrecpsh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_frecpshf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrsqrtsh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_aarch64_rsqrtshf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vsubh_f16 (float16_t __a, float16_t __b)
++{
++ return __a - __b;
++}
++
++/* ARMv8.2-A FP16 three operands scalar intrinsics. */
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vfmah_f16 (float16_t __a, float16_t __b, float16_t __c)
++{
++ return __builtin_aarch64_fmahf (__b, __c, __a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vfmsh_f16 (float16_t __a, float16_t __b, float16_t __c)
++{
++ return __builtin_aarch64_fnmahf (__b, __c, __a);
++}
++
++#pragma GCC pop_options
++
++#endif
+--- a/src/gcc/config/aarch64/arm_neon.h
++++ b/src/gcc/config/aarch64/arm_neon.h
+@@ -466,6 +466,8 @@ typedef struct poly16x8x4_t
+ #define __aarch64_vdup_lane_any(__size, __q, __a, __b) \
+ vdup##__q##_n_##__size (__aarch64_vget_lane_any (__a, __b))
+
++#define __aarch64_vdup_lane_f16(__a, __b) \
++ __aarch64_vdup_lane_any (f16, , __a, __b)
+ #define __aarch64_vdup_lane_f32(__a, __b) \
+ __aarch64_vdup_lane_any (f32, , __a, __b)
+ #define __aarch64_vdup_lane_f64(__a, __b) \
+@@ -492,6 +494,8 @@ typedef struct poly16x8x4_t
+ __aarch64_vdup_lane_any (u64, , __a, __b)
+
+ /* __aarch64_vdup_laneq internal macros. */
++#define __aarch64_vdup_laneq_f16(__a, __b) \
++ __aarch64_vdup_lane_any (f16, , __a, __b)
+ #define __aarch64_vdup_laneq_f32(__a, __b) \
+ __aarch64_vdup_lane_any (f32, , __a, __b)
+ #define __aarch64_vdup_laneq_f64(__a, __b) \
+@@ -518,6 +522,8 @@ typedef struct poly16x8x4_t
+ __aarch64_vdup_lane_any (u64, , __a, __b)
+
+ /* __aarch64_vdupq_lane internal macros. */
++#define __aarch64_vdupq_lane_f16(__a, __b) \
++ __aarch64_vdup_lane_any (f16, q, __a, __b)
+ #define __aarch64_vdupq_lane_f32(__a, __b) \
+ __aarch64_vdup_lane_any (f32, q, __a, __b)
+ #define __aarch64_vdupq_lane_f64(__a, __b) \
+@@ -544,6 +550,8 @@ typedef struct poly16x8x4_t
+ __aarch64_vdup_lane_any (u64, q, __a, __b)
+
+ /* __aarch64_vdupq_laneq internal macros. */
++#define __aarch64_vdupq_laneq_f16(__a, __b) \
++ __aarch64_vdup_lane_any (f16, q, __a, __b)
+ #define __aarch64_vdupq_laneq_f32(__a, __b) \
+ __aarch64_vdup_lane_any (f32, q, __a, __b)
+ #define __aarch64_vdupq_laneq_f64(__a, __b) \
+@@ -601,535 +609,619 @@ typedef struct poly16x8x4_t
+ })
+
+ /* vadd */
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vadd_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vadd_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vadd_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vadd_f32 (float32x2_t __a, float32x2_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vadd_f64 (float64x1_t __a, float64x1_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vadd_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vadd_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vadd_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vadd_s64 (int64x1_t __a, int64x1_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vadd_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return __a + __b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_saddlv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_saddlv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return (int64x2_t) __builtin_aarch64_saddlv2si (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_uaddlv8qi ((int8x8_t) __a,
+ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_uaddlv4hi ((int16x4_t) __a,
+ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return (uint64x2_t) __builtin_aarch64_uaddlv2si ((int32x2_t) __a,
+ (int32x2_t) __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_high_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_saddl2v16qi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_high_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_saddl2v8hi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_high_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int64x2_t) __builtin_aarch64_saddl2v4si (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_high_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_uaddl2v16qi ((int8x16_t) __a,
+ (int8x16_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_high_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_uaddl2v8hi ((int16x8_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddl_high_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return (uint64x2_t) __builtin_aarch64_uaddl2v4si ((int32x4_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_s8 (int16x8_t __a, int8x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_saddwv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_s16 (int32x4_t __a, int16x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_saddwv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_s32 (int64x2_t __a, int32x2_t __b)
+ {
+ return (int64x2_t) __builtin_aarch64_saddwv2si (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_u8 (uint16x8_t __a, uint8x8_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_uaddwv8qi ((int16x8_t) __a,
+ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_u16 (uint32x4_t __a, uint16x4_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_uaddwv4hi ((int32x4_t) __a,
+ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_u32 (uint64x2_t __a, uint32x2_t __b)
+ {
+ return (uint64x2_t) __builtin_aarch64_uaddwv2si ((int64x2_t) __a,
+ (int32x2_t) __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_high_s8 (int16x8_t __a, int8x16_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_saddw2v16qi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_high_s16 (int32x4_t __a, int16x8_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_saddw2v8hi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_high_s32 (int64x2_t __a, int32x4_t __b)
+ {
+ return (int64x2_t) __builtin_aarch64_saddw2v4si (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_high_u8 (uint16x8_t __a, uint8x16_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_uaddw2v16qi ((int16x8_t) __a,
+ (int8x16_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_high_u16 (uint32x4_t __a, uint16x8_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_uaddw2v8hi ((int32x4_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddw_high_u32 (uint64x2_t __a, uint32x4_t __b)
+ {
+ return (uint64x2_t) __builtin_aarch64_uaddw2v4si ((int64x2_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhadd_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return (int8x8_t) __builtin_aarch64_shaddv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhadd_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_shaddv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhadd_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_shaddv2si (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhadd_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return (uint8x8_t) __builtin_aarch64_uhaddv8qi ((int8x8_t) __a,
+ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhadd_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return (uint16x4_t) __builtin_aarch64_uhaddv4hi ((int16x4_t) __a,
+ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhadd_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return (uint32x2_t) __builtin_aarch64_uhaddv2si ((int32x2_t) __a,
+ (int32x2_t) __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhaddq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return (int8x16_t) __builtin_aarch64_shaddv16qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhaddq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_shaddv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhaddq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_shaddv4si (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhaddq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return (uint8x16_t) __builtin_aarch64_uhaddv16qi ((int8x16_t) __a,
+ (int8x16_t) __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhaddq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_uhaddv8hi ((int16x8_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhaddq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_uhaddv4si ((int32x4_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhadd_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return (int8x8_t) __builtin_aarch64_srhaddv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhadd_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_srhaddv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhadd_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_srhaddv2si (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhadd_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return (uint8x8_t) __builtin_aarch64_urhaddv8qi ((int8x8_t) __a,
+ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhadd_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return (uint16x4_t) __builtin_aarch64_urhaddv4hi ((int16x4_t) __a,
+ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhadd_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return (uint32x2_t) __builtin_aarch64_urhaddv2si ((int32x2_t) __a,
+ (int32x2_t) __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhaddq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return (int8x16_t) __builtin_aarch64_srhaddv16qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhaddq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_srhaddv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhaddq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_srhaddv4si (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhaddq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return (uint8x16_t) __builtin_aarch64_urhaddv16qi ((int8x16_t) __a,
+ (int8x16_t) __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhaddq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_urhaddv8hi ((int16x8_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrhaddq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_urhaddv4si ((int32x4_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int8x8_t) __builtin_aarch64_addhnv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_addhnv4si (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_addhnv2di (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return (uint8x8_t) __builtin_aarch64_addhnv8hi ((int16x8_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return (uint16x4_t) __builtin_aarch64_addhnv4si ((int32x4_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return (uint32x2_t) __builtin_aarch64_addhnv2di ((int64x2_t) __a,
+ (int64x2_t) __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int8x8_t) __builtin_aarch64_raddhnv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_raddhnv4si (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_raddhnv2di (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return (uint8x8_t) __builtin_aarch64_raddhnv8hi ((int16x8_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return (uint16x4_t) __builtin_aarch64_raddhnv4si ((int32x4_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return (uint32x2_t) __builtin_aarch64_raddhnv2di ((int64x2_t) __a,
+ (int64x2_t) __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_high_s16 (int8x8_t __a, int16x8_t __b, int16x8_t __c)
+ {
+ return (int8x16_t) __builtin_aarch64_addhn2v8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_high_s32 (int16x4_t __a, int32x4_t __b, int32x4_t __c)
+ {
+ return (int16x8_t) __builtin_aarch64_addhn2v4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_high_s64 (int32x2_t __a, int64x2_t __b, int64x2_t __c)
+ {
+ return (int32x4_t) __builtin_aarch64_addhn2v2di (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_high_u16 (uint8x8_t __a, uint16x8_t __b, uint16x8_t __c)
+ {
+ return (uint8x16_t) __builtin_aarch64_addhn2v8hi ((int8x8_t) __a,
+@@ -1137,7 +1229,8 @@ vaddhn_high_u16 (uint8x8_t __a, uint16x8_t __b, uint16x8_t __c)
+ (int16x8_t) __c);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_high_u32 (uint16x4_t __a, uint32x4_t __b, uint32x4_t __c)
+ {
+ return (uint16x8_t) __builtin_aarch64_addhn2v4si ((int16x4_t) __a,
+@@ -1145,7 +1238,8 @@ vaddhn_high_u32 (uint16x4_t __a, uint32x4_t __b, uint32x4_t __c)
+ (int32x4_t) __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddhn_high_u64 (uint32x2_t __a, uint64x2_t __b, uint64x2_t __c)
+ {
+ return (uint32x4_t) __builtin_aarch64_addhn2v2di ((int32x2_t) __a,
+@@ -1153,25 +1247,29 @@ vaddhn_high_u64 (uint32x2_t __a, uint64x2_t __b, uint64x2_t __c)
+ (int64x2_t) __c);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_high_s16 (int8x8_t __a, int16x8_t __b, int16x8_t __c)
+ {
+ return (int8x16_t) __builtin_aarch64_raddhn2v8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_high_s32 (int16x4_t __a, int32x4_t __b, int32x4_t __c)
+ {
+ return (int16x8_t) __builtin_aarch64_raddhn2v4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_high_s64 (int32x2_t __a, int64x2_t __b, int64x2_t __c)
+ {
+ return (int32x4_t) __builtin_aarch64_raddhn2v2di (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_high_u16 (uint8x8_t __a, uint16x8_t __b, uint16x8_t __c)
+ {
+ return (uint8x16_t) __builtin_aarch64_raddhn2v8hi ((int8x8_t) __a,
+@@ -1179,7 +1277,8 @@ vraddhn_high_u16 (uint8x8_t __a, uint16x8_t __b, uint16x8_t __c)
+ (int16x8_t) __c);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_high_u32 (uint16x4_t __a, uint32x4_t __b, uint32x4_t __c)
+ {
+ return (uint16x8_t) __builtin_aarch64_raddhn2v4si ((int16x4_t) __a,
+@@ -1187,7 +1286,8 @@ vraddhn_high_u32 (uint16x4_t __a, uint32x4_t __b, uint32x4_t __c)
+ (int32x4_t) __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vraddhn_high_u64 (uint32x2_t __a, uint64x2_t __b, uint64x2_t __c)
+ {
+ return (uint32x4_t) __builtin_aarch64_raddhn2v2di ((int32x2_t) __a,
+@@ -1195,1101 +1295,1280 @@ vraddhn_high_u64 (uint32x2_t __a, uint64x2_t __b, uint64x2_t __c)
+ (int64x2_t) __c);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vdiv_f32 (float32x2_t __a, float32x2_t __b)
+ {
+ return __a / __b;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vdiv_f64 (float64x1_t __a, float64x1_t __b)
+ {
+ return __a / __b;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vdivq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+ return __a / __b;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vdivq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+ return __a / __b;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmul_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmul_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmul_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmul_f32 (float32x2_t __a, float32x2_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmul_f64 (float64x1_t __a, float64x1_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmul_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmul_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmul_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmul_p8 (poly8x8_t __a, poly8x8_t __b)
+ {
+ return (poly8x8_t) __builtin_aarch64_pmulv8qi ((int8x8_t) __a,
+ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmulq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmulq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmulq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmulq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmulq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmulq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmulq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmulq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return __a * __b;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vmulq_p8 (poly8x16_t __a, poly8x16_t __b)
+ {
+ return (poly8x16_t) __builtin_aarch64_pmulv16qi ((int8x16_t) __a,
+ (int8x16_t) __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vand_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vand_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vand_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vand_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vand_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vand_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vand_s64 (int64x1_t __a, int64x1_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vand_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vandq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vandq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vandq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vandq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vandq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vandq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vandq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vandq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return __a & __b;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorr_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorr_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorr_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorr_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorr_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorr_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorr_s64 (int64x1_t __a, int64x1_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorr_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorrq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorrq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorrq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorrq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorrq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorrq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorrq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorrq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return __a | __b;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veor_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veor_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veor_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veor_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veor_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veor_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veor_s64 (int64x1_t __a, int64x1_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veor_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veorq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veorq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veorq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veorq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veorq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veorq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veorq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ veorq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return __a ^ __b;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbic_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbic_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbic_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbic_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbic_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbic_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbic_s64 (int64x1_t __a, int64x1_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbic_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbicq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbicq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbicq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbicq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbicq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbicq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbicq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vbicq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return __a & ~__b;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorn_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorn_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorn_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorn_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorn_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorn_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorn_s64 (int64x1_t __a, int64x1_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vorn_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vornq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vornq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vornq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vornq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vornq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vornq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vornq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vornq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return __a | ~__b;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsub_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsub_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsub_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsub_f32 (float32x2_t __a, float32x2_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsub_f64 (float64x1_t __a, float64x1_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsub_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsub_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsub_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsub_s64 (int64x1_t __a, int64x1_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsub_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return __a - __b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_ssublv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_ssublv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return (int64x2_t) __builtin_aarch64_ssublv2si (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_usublv8qi ((int8x8_t) __a,
+ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_usublv4hi ((int16x4_t) __a,
+ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return (uint64x2_t) __builtin_aarch64_usublv2si ((int32x2_t) __a,
+ (int32x2_t) __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_high_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_ssubl2v16qi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_high_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_ssubl2v8hi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_high_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int64x2_t) __builtin_aarch64_ssubl2v4si (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_high_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_usubl2v16qi ((int8x16_t) __a,
+ (int8x16_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_high_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_usubl2v8hi ((int16x8_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubl_high_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return (uint64x2_t) __builtin_aarch64_usubl2v4si ((int32x4_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_s8 (int16x8_t __a, int8x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_ssubwv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_s16 (int32x4_t __a, int16x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_ssubwv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_s32 (int64x2_t __a, int32x2_t __b)
+ {
+ return (int64x2_t) __builtin_aarch64_ssubwv2si (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_u8 (uint16x8_t __a, uint8x8_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_usubwv8qi ((int16x8_t) __a,
+ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_u16 (uint32x4_t __a, uint16x4_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_usubwv4hi ((int32x4_t) __a,
+ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_u32 (uint64x2_t __a, uint32x2_t __b)
+ {
+ return (uint64x2_t) __builtin_aarch64_usubwv2si ((int64x2_t) __a,
+ (int32x2_t) __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_high_s8 (int16x8_t __a, int8x16_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_ssubw2v16qi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_high_s16 (int32x4_t __a, int16x8_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_ssubw2v8hi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_high_s32 (int64x2_t __a, int32x4_t __b)
+ {
+ return (int64x2_t) __builtin_aarch64_ssubw2v4si (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_high_u8 (uint16x8_t __a, uint8x16_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_usubw2v16qi ((int16x8_t) __a,
+ (int8x16_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_high_u16 (uint32x4_t __a, uint16x8_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_usubw2v8hi ((int32x4_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubw_high_u32 (uint64x2_t __a, uint32x4_t __b)
+ {
+ return (uint64x2_t) __builtin_aarch64_usubw2v4si ((int64x2_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqadd_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return (int8x8_t) __builtin_aarch64_sqaddv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqadd_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_sqaddv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqadd_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_sqaddv2si (__a, __b);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqadd_s64 (int64x1_t __a, int64x1_t __b)
+ {
+ return (int64x1_t) {__builtin_aarch64_sqadddi (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqadd_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return __builtin_aarch64_uqaddv8qi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsub_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return (int8x8_t)__builtin_aarch64_shsubv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsub_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_shsubv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsub_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_shsubv2si (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsub_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return (uint8x8_t) __builtin_aarch64_uhsubv8qi ((int8x8_t) __a,
+ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsub_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return (uint16x4_t) __builtin_aarch64_uhsubv4hi ((int16x4_t) __a,
+ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsub_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return (uint32x2_t) __builtin_aarch64_uhsubv2si ((int32x2_t) __a,
+ (int32x2_t) __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsubq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return (int8x16_t) __builtin_aarch64_shsubv16qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsubq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_shsubv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsubq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_shsubv4si (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsubq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return (uint8x16_t) __builtin_aarch64_uhsubv16qi ((int8x16_t) __a,
+ (int8x16_t) __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsubq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_uhsubv8hi ((int16x8_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vhsubq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_uhsubv4si ((int32x4_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int8x8_t) __builtin_aarch64_subhnv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_subhnv4si (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_subhnv2di (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return (uint8x8_t) __builtin_aarch64_subhnv8hi ((int16x8_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return (uint16x4_t) __builtin_aarch64_subhnv4si ((int32x4_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return (uint32x2_t) __builtin_aarch64_subhnv2di ((int64x2_t) __a,
+ (int64x2_t) __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int8x8_t) __builtin_aarch64_rsubhnv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_rsubhnv4si (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_rsubhnv2di (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return (uint8x8_t) __builtin_aarch64_rsubhnv8hi ((int16x8_t) __a,
+ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return (uint16x4_t) __builtin_aarch64_rsubhnv4si ((int32x4_t) __a,
+ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return (uint32x2_t) __builtin_aarch64_rsubhnv2di ((int64x2_t) __a,
+ (int64x2_t) __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_high_s16 (int8x8_t __a, int16x8_t __b, int16x8_t __c)
+ {
+ return (int8x16_t) __builtin_aarch64_rsubhn2v8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_high_s32 (int16x4_t __a, int32x4_t __b, int32x4_t __c)
+ {
+ return (int16x8_t) __builtin_aarch64_rsubhn2v4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_high_s64 (int32x2_t __a, int64x2_t __b, int64x2_t __c)
+ {
+ return (int32x4_t) __builtin_aarch64_rsubhn2v2di (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_high_u16 (uint8x8_t __a, uint16x8_t __b, uint16x8_t __c)
+ {
+ return (uint8x16_t) __builtin_aarch64_rsubhn2v8hi ((int8x8_t) __a,
+@@ -2297,7 +2576,8 @@ vrsubhn_high_u16 (uint8x8_t __a, uint16x8_t __b, uint16x8_t __c)
+ (int16x8_t) __c);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_high_u32 (uint16x4_t __a, uint32x4_t __b, uint32x4_t __c)
+ {
+ return (uint16x8_t) __builtin_aarch64_rsubhn2v4si ((int16x4_t) __a,
+@@ -2305,7 +2585,8 @@ vrsubhn_high_u32 (uint16x4_t __a, uint32x4_t __b, uint32x4_t __c)
+ (int32x4_t) __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vrsubhn_high_u64 (uint32x2_t __a, uint64x2_t __b, uint64x2_t __c)
+ {
+ return (uint32x4_t) __builtin_aarch64_rsubhn2v2di ((int32x2_t) __a,
+@@ -2313,25 +2594,29 @@ vrsubhn_high_u64 (uint32x2_t __a, uint64x2_t __b, uint64x2_t __c)
+ (int64x2_t) __c);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_high_s16 (int8x8_t __a, int16x8_t __b, int16x8_t __c)
+ {
+ return (int8x16_t) __builtin_aarch64_subhn2v8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_high_s32 (int16x4_t __a, int32x4_t __b, int32x4_t __c)
+ {
+ return (int16x8_t) __builtin_aarch64_subhn2v4si (__a, __b, __c);;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_high_s64 (int32x2_t __a, int64x2_t __b, int64x2_t __c)
+ {
+ return (int32x4_t) __builtin_aarch64_subhn2v2di (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_high_u16 (uint8x8_t __a, uint16x8_t __b, uint16x8_t __c)
+ {
+ return (uint8x16_t) __builtin_aarch64_subhn2v8hi ((int8x8_t) __a,
+@@ -2339,7 +2624,8 @@ vsubhn_high_u16 (uint8x8_t __a, uint16x8_t __b, uint16x8_t __c)
+ (int16x8_t) __c);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_high_u32 (uint16x4_t __a, uint32x4_t __b, uint32x4_t __c)
+ {
+ return (uint16x8_t) __builtin_aarch64_subhn2v4si ((int16x4_t) __a,
+@@ -2347,7 +2633,8 @@ vsubhn_high_u32 (uint16x4_t __a, uint32x4_t __b, uint32x4_t __c)
+ (int32x4_t) __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsubhn_high_u64 (uint32x2_t __a, uint64x2_t __b, uint64x2_t __c)
+ {
+ return (uint32x4_t) __builtin_aarch64_subhn2v2di ((int32x2_t) __a,
+@@ -2355,373 +2642,435 @@ vsubhn_high_u64 (uint32x2_t __a, uint64x2_t __b, uint64x2_t __c)
+ (int64x2_t) __c);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqadd_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return __builtin_aarch64_uqaddv4hi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqadd_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return __builtin_aarch64_uqaddv2si_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqadd_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+ return (uint64x1_t) {__builtin_aarch64_uqadddi_uuu (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqaddq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return (int8x16_t) __builtin_aarch64_sqaddv16qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqaddq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_sqaddv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqaddq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_sqaddv4si (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqaddq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return (int64x2_t) __builtin_aarch64_sqaddv2di (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqaddq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return __builtin_aarch64_uqaddv16qi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqaddq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return __builtin_aarch64_uqaddv8hi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqaddq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return __builtin_aarch64_uqaddv4si_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqaddq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return __builtin_aarch64_uqaddv2di_uuu (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsub_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return (int8x8_t) __builtin_aarch64_sqsubv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsub_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_sqsubv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsub_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_sqsubv2si (__a, __b);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsub_s64 (int64x1_t __a, int64x1_t __b)
+ {
+ return (int64x1_t) {__builtin_aarch64_sqsubdi (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsub_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return __builtin_aarch64_uqsubv8qi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsub_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return __builtin_aarch64_uqsubv4hi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsub_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return __builtin_aarch64_uqsubv2si_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsub_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+ return (uint64x1_t) {__builtin_aarch64_uqsubdi_uuu (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsubq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ return (int8x16_t) __builtin_aarch64_sqsubv16qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsubq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_sqsubv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsubq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_sqsubv4si (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsubq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ return (int64x2_t) __builtin_aarch64_sqsubv2di (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsubq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ return __builtin_aarch64_uqsubv16qi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsubq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ return __builtin_aarch64_uqsubv8hi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsubq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ return __builtin_aarch64_uqsubv4si_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqsubq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ return __builtin_aarch64_uqsubv2di_uuu (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqneg_s8 (int8x8_t __a)
+ {
+ return (int8x8_t) __builtin_aarch64_sqnegv8qi (__a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqneg_s16 (int16x4_t __a)
+ {
+ return (int16x4_t) __builtin_aarch64_sqnegv4hi (__a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqneg_s32 (int32x2_t __a)
+ {
+ return (int32x2_t) __builtin_aarch64_sqnegv2si (__a);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqneg_s64 (int64x1_t __a)
+ {
+ return (int64x1_t) {__builtin_aarch64_sqnegdi (__a[0])};
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqnegq_s8 (int8x16_t __a)
+ {
+ return (int8x16_t) __builtin_aarch64_sqnegv16qi (__a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqnegq_s16 (int16x8_t __a)
+ {
+ return (int16x8_t) __builtin_aarch64_sqnegv8hi (__a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqnegq_s32 (int32x4_t __a)
+ {
+ return (int32x4_t) __builtin_aarch64_sqnegv4si (__a);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqabs_s8 (int8x8_t __a)
+ {
+ return (int8x8_t) __builtin_aarch64_sqabsv8qi (__a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqabs_s16 (int16x4_t __a)
+ {
+ return (int16x4_t) __builtin_aarch64_sqabsv4hi (__a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqabs_s32 (int32x2_t __a)
+ {
+ return (int32x2_t) __builtin_aarch64_sqabsv2si (__a);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqabs_s64 (int64x1_t __a)
+ {
+ return (int64x1_t) {__builtin_aarch64_sqabsdi (__a[0])};
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqabsq_s8 (int8x16_t __a)
+ {
+ return (int8x16_t) __builtin_aarch64_sqabsv16qi (__a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqabsq_s16 (int16x8_t __a)
+ {
+ return (int16x8_t) __builtin_aarch64_sqabsv8hi (__a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqabsq_s32 (int32x4_t __a)
+ {
+ return (int32x4_t) __builtin_aarch64_sqabsv4si (__a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqdmulh_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_sqdmulhv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqdmulh_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_sqdmulhv2si (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqdmulhq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_sqdmulhv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqdmulhq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_sqdmulhv4si (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqrdmulh_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return (int16x4_t) __builtin_aarch64_sqrdmulhv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqrdmulh_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return (int32x2_t) __builtin_aarch64_sqrdmulhv2si (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqrdmulhq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_sqrdmulhv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_sqrdmulhv4si (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_s8 (uint64_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_s16 (uint64_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_s32 (uint64_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_s64 (uint64_t __a)
+ {
+ return (int64x1_t) {__a};
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_f16 (uint64_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_f32 (uint64_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_u8 (uint64_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_u16 (uint64_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_u32 (uint64_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_u64 (uint64_t __a)
+ {
+ return (uint64x1_t) {__a};
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_f64 (uint64_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_p8 (uint64_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcreate_p16 (uint64_t __a)
+ {
+ return (poly16x4_t) __a;
+@@ -2729,79 +3078,92 @@ vcreate_p16 (uint64_t __a)
+
+ /* vget_lane */
+
+-__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_f16 (float16x4_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_f32 (float32x2_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_f64 (float64x1_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline poly8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_p8 (poly8x8_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline poly16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_p16 (poly16x4_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_s8 (int8x8_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_s16 (int16x4_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_s32 (int32x2_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_s64 (int64x1_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_u8 (uint8x8_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_u16 (uint16x4_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_u32 (uint32x2_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_lane_u64 (uint64x1_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+@@ -2809,79 +3171,92 @@ vget_lane_u64 (uint64x1_t __a, const int __b)
+
+ /* vgetq_lane */
+
+-__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_f16 (float16x8_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_f32 (float32x4_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_f64 (float64x2_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline poly8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_p8 (poly8x16_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline poly16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_p16 (poly16x8_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_s8 (int8x16_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_s16 (int16x8_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_s32 (int32x4_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_s64 (int64x2_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_u8 (uint8x16_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_u16 (uint16x8_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_u32 (uint32x4_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vgetq_lane_u64 (uint64x2_t __a, const int __b)
+ {
+ return __aarch64_vget_lane_any (__a, __b);
+@@ -2889,1873 +3264,2185 @@ vgetq_lane_u64 (uint64x2_t __a, const int __b)
+
+ /* vreinterpret */
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_f16 (float16x4_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_f64 (float64x1_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_s8 (int8x8_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_s16 (int16x4_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_s32 (int32x2_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_s64 (int64x1_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_f32 (float32x2_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_u8 (uint8x8_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_u16 (uint16x4_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_u32 (uint32x2_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_u64 (uint64x1_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p8_p16 (poly16x4_t __a)
+ {
+ return (poly8x8_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_f64 (float64x2_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_s8 (int8x16_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_s16 (int16x8_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_s32 (int32x4_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_s64 (int64x2_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_f16 (float16x8_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_f32 (float32x4_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_u8 (uint8x16_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_u16 (uint16x8_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_u32 (uint32x4_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_u64 (uint64x2_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p8_p16 (poly16x8_t __a)
+ {
+ return (poly8x16_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_f16 (float16x4_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_f64 (float64x1_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_s8 (int8x8_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_s16 (int16x4_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_s32 (int32x2_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_s64 (int64x1_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_f32 (float32x2_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_u8 (uint8x8_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_u16 (uint16x4_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_u32 (uint32x2_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_u64 (uint64x1_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_p16_p8 (poly8x8_t __a)
+ {
+ return (poly16x4_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_f64 (float64x2_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_s8 (int8x16_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_s16 (int16x8_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_s32 (int32x4_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_s64 (int64x2_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_f16 (float16x8_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_f32 (float32x4_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_u8 (uint8x16_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_u16 (uint16x8_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_u32 (uint32x4_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_u64 (uint64x2_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_p16_p8 (poly8x16_t __a)
+ {
+ return (poly16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_f64 (float64x1_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_s8 (int8x8_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_s16 (int16x4_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_s32 (int32x2_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_s64 (int64x1_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_f32 (float32x2_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_u8 (uint8x8_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_u16 (uint16x4_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_u32 (uint32x2_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_u64 (uint64x1_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_p8 (poly8x8_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f16_p16 (poly16x4_t __a)
+ {
+ return (float16x4_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_f64 (float64x2_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_s8 (int8x16_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_s16 (int16x8_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_s32 (int32x4_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_s64 (int64x2_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_f32 (float32x4_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_u8 (uint8x16_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_u16 (uint16x8_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_u32 (uint32x4_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_u64 (uint64x2_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_p8 (poly8x16_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f16_p16 (poly16x8_t __a)
+ {
+ return (float16x8_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_f16 (float16x4_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_f64 (float64x1_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_s8 (int8x8_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_s16 (int16x4_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_s32 (int32x2_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_s64 (int64x1_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_u8 (uint8x8_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_u16 (uint16x4_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_u32 (uint32x2_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_u64 (uint64x1_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_p8 (poly8x8_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f32_p16 (poly16x4_t __a)
+ {
+ return (float32x2_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_f16 (float16x8_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_f64 (float64x2_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_s8 (int8x16_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_s16 (int16x8_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_s32 (int32x4_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_s64 (int64x2_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_u8 (uint8x16_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_u16 (uint16x8_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_u32 (uint32x4_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_u64 (uint64x2_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_p8 (poly8x16_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f32_p16 (poly16x8_t __a)
+ {
+ return (float32x4_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_f16 (float16x4_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_f32 (float32x2_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_p8 (poly8x8_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_p16 (poly16x4_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_s8 (int8x8_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_s16 (int16x4_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_s32 (int32x2_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_s64 (int64x1_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_u8 (uint8x8_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_u16 (uint16x4_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_u32 (uint32x2_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_f64_u64 (uint64x1_t __a)
+ {
+ return (float64x1_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_f16 (float16x8_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_f32 (float32x4_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_p8 (poly8x16_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_p16 (poly16x8_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_s8 (int8x16_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_s16 (int16x8_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_s32 (int32x4_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_s64 (int64x2_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_u8 (uint8x16_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_u16 (uint16x8_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_u32 (uint32x4_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_f64_u64 (uint64x2_t __a)
+ {
+ return (float64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_f16 (float16x4_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_f64 (float64x1_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_s8 (int8x8_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_s16 (int16x4_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_s32 (int32x2_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_f32 (float32x2_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_u8 (uint8x8_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_u16 (uint16x4_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_u32 (uint32x2_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_u64 (uint64x1_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_p8 (poly8x8_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s64_p16 (poly16x4_t __a)
+ {
+ return (int64x1_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_f64 (float64x2_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_s8 (int8x16_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_s16 (int16x8_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_s32 (int32x4_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_f16 (float16x8_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_f32 (float32x4_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_u8 (uint8x16_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_u16 (uint16x8_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_u32 (uint32x4_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_u64 (uint64x2_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_p8 (poly8x16_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s64_p16 (poly16x8_t __a)
+ {
+ return (int64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_f16 (float16x4_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_f64 (float64x1_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_s8 (int8x8_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_s16 (int16x4_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_s32 (int32x2_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_s64 (int64x1_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_f32 (float32x2_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_u8 (uint8x8_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_u16 (uint16x4_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_u32 (uint32x2_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_p8 (poly8x8_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u64_p16 (poly16x4_t __a)
+ {
+ return (uint64x1_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_f64 (float64x2_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_s8 (int8x16_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_s16 (int16x8_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_s32 (int32x4_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_s64 (int64x2_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_f16 (float16x8_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_f32 (float32x4_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_u8 (uint8x16_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_u16 (uint16x8_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_u32 (uint32x4_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_p8 (poly8x16_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u64_p16 (poly16x8_t __a)
+ {
+ return (uint64x2_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_f16 (float16x4_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_f64 (float64x1_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_s16 (int16x4_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_s32 (int32x2_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_s64 (int64x1_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_f32 (float32x2_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_u8 (uint8x8_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_u16 (uint16x4_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_u32 (uint32x2_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_u64 (uint64x1_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_p8 (poly8x8_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s8_p16 (poly16x4_t __a)
+ {
+ return (int8x8_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_f64 (float64x2_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_s16 (int16x8_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_s32 (int32x4_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_s64 (int64x2_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_f16 (float16x8_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_f32 (float32x4_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_u8 (uint8x16_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_u16 (uint16x8_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_u32 (uint32x4_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_u64 (uint64x2_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_p8 (poly8x16_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s8_p16 (poly16x8_t __a)
+ {
+ return (int8x16_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_f16 (float16x4_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_f64 (float64x1_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_s8 (int8x8_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_s32 (int32x2_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_s64 (int64x1_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_f32 (float32x2_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_u8 (uint8x8_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_u16 (uint16x4_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_u32 (uint32x2_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_u64 (uint64x1_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_p8 (poly8x8_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s16_p16 (poly16x4_t __a)
+ {
+ return (int16x4_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_f64 (float64x2_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_s8 (int8x16_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_s32 (int32x4_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_s64 (int64x2_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_f16 (float16x8_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_f32 (float32x4_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_u8 (uint8x16_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_u16 (uint16x8_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_u32 (uint32x4_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_u64 (uint64x2_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_p8 (poly8x16_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s16_p16 (poly16x8_t __a)
+ {
+ return (int16x8_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_f16 (float16x4_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_f64 (float64x1_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_s8 (int8x8_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_s16 (int16x4_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_s64 (int64x1_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_f32 (float32x2_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_u8 (uint8x8_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_u16 (uint16x4_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_u32 (uint32x2_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_u64 (uint64x1_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_p8 (poly8x8_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_s32_p16 (poly16x4_t __a)
+ {
+ return (int32x2_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_f64 (float64x2_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_s8 (int8x16_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_s16 (int16x8_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_s64 (int64x2_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_f16 (float16x8_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_f32 (float32x4_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_u8 (uint8x16_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_u16 (uint16x8_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_u32 (uint32x4_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_u64 (uint64x2_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_p8 (poly8x16_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_s32_p16 (poly16x8_t __a)
+ {
+ return (int32x4_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_f16 (float16x4_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_f64 (float64x1_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_s8 (int8x8_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_s16 (int16x4_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_s32 (int32x2_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_s64 (int64x1_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_f32 (float32x2_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_u16 (uint16x4_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_u32 (uint32x2_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_u64 (uint64x1_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_p8 (poly8x8_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u8_p16 (poly16x4_t __a)
+ {
+ return (uint8x8_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_f64 (float64x2_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_s8 (int8x16_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_s16 (int16x8_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_s32 (int32x4_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_s64 (int64x2_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_f16 (float16x8_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_f32 (float32x4_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_u16 (uint16x8_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_u32 (uint32x4_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_u64 (uint64x2_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_p8 (poly8x16_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u8_p16 (poly16x8_t __a)
+ {
+ return (uint8x16_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_f16 (float16x4_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_f64 (float64x1_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_s8 (int8x8_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_s16 (int16x4_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_s32 (int32x2_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_s64 (int64x1_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_f32 (float32x2_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_u8 (uint8x8_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_u32 (uint32x2_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_u64 (uint64x1_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_p8 (poly8x8_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u16_p16 (poly16x4_t __a)
+ {
+ return (uint16x4_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_f64 (float64x2_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_s8 (int8x16_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_s16 (int16x8_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_s32 (int32x4_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_s64 (int64x2_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_f16 (float16x8_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_f32 (float32x4_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_u8 (uint8x16_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_u32 (uint32x4_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_u64 (uint64x2_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_p8 (poly8x16_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u16_p16 (poly16x8_t __a)
+ {
+ return (uint16x8_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_f16 (float16x4_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_f64 (float64x1_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_s8 (int8x8_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_s16 (int16x4_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_s32 (int32x2_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_s64 (int64x1_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_f32 (float32x2_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_u8 (uint8x8_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_u16 (uint16x4_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_u64 (uint64x1_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_p8 (poly8x8_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpret_u32_p16 (poly16x4_t __a)
+ {
+ return (uint32x2_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_f64 (float64x2_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_s8 (int8x16_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_s16 (int16x8_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_s32 (int32x4_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_s64 (int64x2_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_f16 (float16x8_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_f32 (float32x4_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_u8 (uint8x16_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_u16 (uint16x8_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_u64 (uint64x2_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_p8 (poly8x16_t __a)
+ {
+ return (uint32x4_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vreinterpretq_u32_p16 (poly16x8_t __a)
+ {
+ return (uint32x4_t) __a;
+@@ -4763,79 +5450,92 @@ vreinterpretq_u32_p16 (poly16x8_t __a)
+
+ /* vset_lane */
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_f16 (float16_t __elem, float16x4_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_f32 (float32_t __elem, float32x2_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_f64 (float64_t __elem, float64x1_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_p8 (poly8_t __elem, poly8x8_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_p16 (poly16_t __elem, poly16x4_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_s8 (int8_t __elem, int8x8_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_s16 (int16_t __elem, int16x4_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_s32 (int32_t __elem, int32x2_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_s64 (int64_t __elem, int64x1_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_u8 (uint8_t __elem, uint8x8_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_u16 (uint16_t __elem, uint16x4_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_u32 (uint32_t __elem, uint32x2_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vset_lane_u64 (uint64_t __elem, uint64x1_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+@@ -4843,79 +5543,92 @@ vset_lane_u64 (uint64_t __elem, uint64x1_t __vec, const int __index)
+
+ /* vsetq_lane */
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_f16 (float16_t __elem, float16x8_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_f32 (float32_t __elem, float32x4_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_f64 (float64_t __elem, float64x2_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_p8 (poly8_t __elem, poly8x16_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_p16 (poly16_t __elem, poly16x8_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_s8 (int8_t __elem, int8x16_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_s16 (int16_t __elem, int16x8_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_s32 (int32_t __elem, int32x4_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_s64 (int64_t __elem, int64x2_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_u8 (uint8_t __elem, uint8x16_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_u16 (uint16_t __elem, uint16x8_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_u32 (uint32_t __elem, uint32x4_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vsetq_lane_u64 (uint64_t __elem, uint64x2_t __vec, const int __index)
+ {
+ return __aarch64_vset_lane_any (__elem, __vec, __index);
+@@ -4926,79 +5639,92 @@ vsetq_lane_u64 (uint64_t __elem, uint64x2_t __vec, const int __index)
+ uint64x1_t lo = vcreate_u64 (vgetq_lane_u64 (tmp, 0)); \
+ return vreinterpret_##__TYPE##_u64 (lo);
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_f16 (float16x8_t __a)
+ {
+ __GET_LOW (f16);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_f32 (float32x4_t __a)
+ {
+ __GET_LOW (f32);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_f64 (float64x2_t __a)
+ {
+ return (float64x1_t) {vgetq_lane_f64 (__a, 0)};
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_p8 (poly8x16_t __a)
+ {
+ __GET_LOW (p8);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_p16 (poly16x8_t __a)
+ {
+ __GET_LOW (p16);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_s8 (int8x16_t __a)
+ {
+ __GET_LOW (s8);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_s16 (int16x8_t __a)
+ {
+ __GET_LOW (s16);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_s32 (int32x4_t __a)
+ {
+ __GET_LOW (s32);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_s64 (int64x2_t __a)
+ {
+ __GET_LOW (s64);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_u8 (uint8x16_t __a)
+ {
+ __GET_LOW (u8);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_u16 (uint16x8_t __a)
+ {
+ __GET_LOW (u16);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_u32 (uint32x4_t __a)
+ {
+ __GET_LOW (u32);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_low_u64 (uint64x2_t __a)
+ {
+ return vcreate_u64 (vgetq_lane_u64 (__a, 0));
+@@ -5011,73 +5737,85 @@ vget_low_u64 (uint64x2_t __a)
+ uint64x1_t hi = vcreate_u64 (vgetq_lane_u64 (tmp, 1)); \
+ return vreinterpret_##__TYPE##_u64 (hi);
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_f16 (float16x8_t __a)
+ {
+ __GET_HIGH (f16);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_f32 (float32x4_t __a)
+ {
+ __GET_HIGH (f32);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_f64 (float64x2_t __a)
+ {
+ __GET_HIGH (f64);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_p8 (poly8x16_t __a)
+ {
+ __GET_HIGH (p8);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_p16 (poly16x8_t __a)
+ {
+ __GET_HIGH (p16);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_s8 (int8x16_t __a)
+ {
+ __GET_HIGH (s8);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_s16 (int16x8_t __a)
+ {
+ __GET_HIGH (s16);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_s32 (int32x4_t __a)
+ {
+ __GET_HIGH (s32);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_s64 (int64x2_t __a)
+ {
+ __GET_HIGH (s64);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_u8 (uint8x16_t __a)
+ {
+ __GET_HIGH (u8);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_u16 (uint16x8_t __a)
+ {
+ __GET_HIGH (u16);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_u32 (uint32x4_t __a)
+ {
+ __GET_HIGH (u32);
+@@ -5085,89 +5823,103 @@ vget_high_u32 (uint32x4_t __a)
+
+ #undef __GET_HIGH
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vget_high_u64 (uint64x2_t __a)
+ {
+ return vcreate_u64 (vgetq_lane_u64 (__a, 1));
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ return (int8x16_t) __builtin_aarch64_combinev8qi (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ return (int16x8_t) __builtin_aarch64_combinev4hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ return (int32x4_t) __builtin_aarch64_combinev2si (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_s64 (int64x1_t __a, int64x1_t __b)
+ {
+ return __builtin_aarch64_combinedi (__a[0], __b[0]);
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_f16 (float16x4_t __a, float16x4_t __b)
+ {
+ return __builtin_aarch64_combinev4hf (__a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_f32 (float32x2_t __a, float32x2_t __b)
+ {
+ return (float32x4_t) __builtin_aarch64_combinev2sf (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ return (uint8x16_t) __builtin_aarch64_combinev8qi ((int8x8_t) __a,
+ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ return (uint16x8_t) __builtin_aarch64_combinev4hi ((int16x4_t) __a,
+ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ return (uint32x4_t) __builtin_aarch64_combinev2si ((int32x2_t) __a,
+ (int32x2_t) __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+ return (uint64x2_t) __builtin_aarch64_combinedi (__a[0], __b[0]);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_f64 (float64x1_t __a, float64x1_t __b)
+ {
+ return __builtin_aarch64_combinedf (__a[0], __b[0]);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_p8 (poly8x8_t __a, poly8x8_t __b)
+ {
+ return (poly8x16_t) __builtin_aarch64_combinev8qi ((int8x8_t) __a,
+ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vcombine_p16 (poly16x4_t __a, poly16x4_t __b)
+ {
+ return (poly16x8_t) __builtin_aarch64_combinev4hi ((int16x4_t) __a,
+@@ -5176,7 +5928,8 @@ vcombine_p16 (poly16x4_t __a, poly16x4_t __b)
+
+ /* Start of temporary inline asm implementations. */
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaba_s8 (int8x8_t a, int8x8_t b, int8x8_t c)
+ {
+ int8x8_t result;
+@@ -5187,7 +5940,8 @@ vaba_s8 (int8x8_t a, int8x8_t b, int8x8_t c)
+ return result;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaba_s16 (int16x4_t a, int16x4_t b, int16x4_t c)
+ {
+ int16x4_t result;
+@@ -5198,7 +5952,8 @@ vaba_s16 (int16x4_t a, int16x4_t b, int16x4_t c)
+ return result;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaba_s32 (int32x2_t a, int32x2_t b, int32x2_t c)
+ {
+ int32x2_t result;
+@@ -5209,7 +5964,8 @@ vaba_s32 (int32x2_t a, int32x2_t b, int32x2_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaba_u8 (uint8x8_t a, uint8x8_t b, uint8x8_t c)
+ {
+ uint8x8_t result;
+@@ -5220,7 +5976,8 @@ vaba_u8 (uint8x8_t a, uint8x8_t b, uint8x8_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaba_u16 (uint16x4_t a, uint16x4_t b, uint16x4_t c)
+ {
+ uint16x4_t result;
+@@ -5231,7 +5988,8 @@ vaba_u16 (uint16x4_t a, uint16x4_t b, uint16x4_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaba_u32 (uint32x2_t a, uint32x2_t b, uint32x2_t c)
+ {
+ uint32x2_t result;
+@@ -5242,7 +6000,8 @@ vaba_u32 (uint32x2_t a, uint32x2_t b, uint32x2_t c)
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_high_s8 (int16x8_t a, int8x16_t b, int8x16_t c)
+ {
+ int16x8_t result;
+@@ -5253,7 +6012,8 @@ vabal_high_s8 (int16x8_t a, int8x16_t b, int8x16_t c)
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_high_s16 (int32x4_t a, int16x8_t b, int16x8_t c)
+ {
+ int32x4_t result;
+@@ -5264,7 +6024,8 @@ vabal_high_s16 (int32x4_t a, int16x8_t b, int16x8_t c)
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_high_s32 (int64x2_t a, int32x4_t b, int32x4_t c)
+ {
+ int64x2_t result;
+@@ -5275,7 +6036,8 @@ vabal_high_s32 (int64x2_t a, int32x4_t b, int32x4_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_high_u8 (uint16x8_t a, uint8x16_t b, uint8x16_t c)
+ {
+ uint16x8_t result;
+@@ -5286,7 +6048,8 @@ vabal_high_u8 (uint16x8_t a, uint8x16_t b, uint8x16_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_high_u16 (uint32x4_t a, uint16x8_t b, uint16x8_t c)
+ {
+ uint32x4_t result;
+@@ -5297,7 +6060,8 @@ vabal_high_u16 (uint32x4_t a, uint16x8_t b, uint16x8_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_high_u32 (uint64x2_t a, uint32x4_t b, uint32x4_t c)
+ {
+ uint64x2_t result;
+@@ -5308,7 +6072,8 @@ vabal_high_u32 (uint64x2_t a, uint32x4_t b, uint32x4_t c)
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_s8 (int16x8_t a, int8x8_t b, int8x8_t c)
+ {
+ int16x8_t result;
+@@ -5319,7 +6084,8 @@ vabal_s8 (int16x8_t a, int8x8_t b, int8x8_t c)
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_s16 (int32x4_t a, int16x4_t b, int16x4_t c)
+ {
+ int32x4_t result;
+@@ -5330,7 +6096,8 @@ vabal_s16 (int32x4_t a, int16x4_t b, int16x4_t c)
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_s32 (int64x2_t a, int32x2_t b, int32x2_t c)
+ {
+ int64x2_t result;
+@@ -5341,7 +6108,8 @@ vabal_s32 (int64x2_t a, int32x2_t b, int32x2_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_u8 (uint16x8_t a, uint8x8_t b, uint8x8_t c)
+ {
+ uint16x8_t result;
+@@ -5352,7 +6120,8 @@ vabal_u8 (uint16x8_t a, uint8x8_t b, uint8x8_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_u16 (uint32x4_t a, uint16x4_t b, uint16x4_t c)
+ {
+ uint32x4_t result;
+@@ -5363,7 +6132,8 @@ vabal_u16 (uint32x4_t a, uint16x4_t b, uint16x4_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabal_u32 (uint64x2_t a, uint32x2_t b, uint32x2_t c)
+ {
+ uint64x2_t result;
+@@ -5374,7 +6144,8 @@ vabal_u32 (uint64x2_t a, uint32x2_t b, uint32x2_t c)
+ return result;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabaq_s8 (int8x16_t a, int8x16_t b, int8x16_t c)
+ {
+ int8x16_t result;
+@@ -5385,7 +6156,8 @@ vabaq_s8 (int8x16_t a, int8x16_t b, int8x16_t c)
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabaq_s16 (int16x8_t a, int16x8_t b, int16x8_t c)
+ {
+ int16x8_t result;
+@@ -5396,7 +6168,8 @@ vabaq_s16 (int16x8_t a, int16x8_t b, int16x8_t c)
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabaq_s32 (int32x4_t a, int32x4_t b, int32x4_t c)
+ {
+ int32x4_t result;
+@@ -5407,7 +6180,8 @@ vabaq_s32 (int32x4_t a, int32x4_t b, int32x4_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabaq_u8 (uint8x16_t a, uint8x16_t b, uint8x16_t c)
+ {
+ uint8x16_t result;
+@@ -5418,7 +6192,8 @@ vabaq_u8 (uint8x16_t a, uint8x16_t b, uint8x16_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabaq_u16 (uint16x8_t a, uint16x8_t b, uint16x8_t c)
+ {
+ uint16x8_t result;
+@@ -5429,7 +6204,8 @@ vabaq_u16 (uint16x8_t a, uint16x8_t b, uint16x8_t c)
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabaq_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t c)
+ {
+ uint32x4_t result;
+@@ -5440,18 +6216,8 @@ vabaq_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t c)
+ return result;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vabd_f32 (float32x2_t a, float32x2_t b)
+-{
+- float32x2_t result;
+- __asm__ ("fabd %0.2s, %1.2s, %2.2s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabd_s8 (int8x8_t a, int8x8_t b)
+ {
+ int8x8_t result;
+@@ -5462,7 +6228,8 @@ vabd_s8 (int8x8_t a, int8x8_t b)
+ return result;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabd_s16 (int16x4_t a, int16x4_t b)
+ {
+ int16x4_t result;
+@@ -5473,7 +6240,8 @@ vabd_s16 (int16x4_t a, int16x4_t b)
+ return result;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabd_s32 (int32x2_t a, int32x2_t b)
+ {
+ int32x2_t result;
+@@ -5484,7 +6252,8 @@ vabd_s32 (int32x2_t a, int32x2_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabd_u8 (uint8x8_t a, uint8x8_t b)
+ {
+ uint8x8_t result;
+@@ -5495,7 +6264,8 @@ vabd_u8 (uint8x8_t a, uint8x8_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabd_u16 (uint16x4_t a, uint16x4_t b)
+ {
+ uint16x4_t result;
+@@ -5506,7 +6276,8 @@ vabd_u16 (uint16x4_t a, uint16x4_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabd_u32 (uint32x2_t a, uint32x2_t b)
+ {
+ uint32x2_t result;
+@@ -5517,18 +6288,8 @@ vabd_u32 (uint32x2_t a, uint32x2_t b)
+ return result;
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vabdd_f64 (float64_t a, float64_t b)
+-{
+- float64_t result;
+- __asm__ ("fabd %d0, %d1, %d2"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
+-
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_high_s8 (int8x16_t a, int8x16_t b)
+ {
+ int16x8_t result;
+@@ -5539,7 +6300,8 @@ vabdl_high_s8 (int8x16_t a, int8x16_t b)
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_high_s16 (int16x8_t a, int16x8_t b)
+ {
+ int32x4_t result;
+@@ -5550,7 +6312,8 @@ vabdl_high_s16 (int16x8_t a, int16x8_t b)
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_high_s32 (int32x4_t a, int32x4_t b)
+ {
+ int64x2_t result;
+@@ -5561,7 +6324,8 @@ vabdl_high_s32 (int32x4_t a, int32x4_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_high_u8 (uint8x16_t a, uint8x16_t b)
+ {
+ uint16x8_t result;
+@@ -5572,7 +6336,8 @@ vabdl_high_u8 (uint8x16_t a, uint8x16_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_high_u16 (uint16x8_t a, uint16x8_t b)
+ {
+ uint32x4_t result;
+@@ -5583,7 +6348,8 @@ vabdl_high_u16 (uint16x8_t a, uint16x8_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_high_u32 (uint32x4_t a, uint32x4_t b)
+ {
+ uint64x2_t result;
+@@ -5594,7 +6360,8 @@ vabdl_high_u32 (uint32x4_t a, uint32x4_t b)
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_s8 (int8x8_t a, int8x8_t b)
+ {
+ int16x8_t result;
+@@ -5605,7 +6372,8 @@ vabdl_s8 (int8x8_t a, int8x8_t b)
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_s16 (int16x4_t a, int16x4_t b)
+ {
+ int32x4_t result;
+@@ -5616,7 +6384,8 @@ vabdl_s16 (int16x4_t a, int16x4_t b)
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_s32 (int32x2_t a, int32x2_t b)
+ {
+ int64x2_t result;
+@@ -5627,7 +6396,8 @@ vabdl_s32 (int32x2_t a, int32x2_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_u8 (uint8x8_t a, uint8x8_t b)
+ {
+ uint16x8_t result;
+@@ -5638,7 +6408,8 @@ vabdl_u8 (uint8x8_t a, uint8x8_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_u16 (uint16x4_t a, uint16x4_t b)
+ {
+ uint32x4_t result;
+@@ -5649,7 +6420,8 @@ vabdl_u16 (uint16x4_t a, uint16x4_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdl_u32 (uint32x2_t a, uint32x2_t b)
+ {
+ uint64x2_t result;
+@@ -5660,29 +6432,8 @@ vabdl_u32 (uint32x2_t a, uint32x2_t b)
+ return result;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vabdq_f32 (float32x4_t a, float32x4_t b)
+-{
+- float32x4_t result;
+- __asm__ ("fabd %0.4s, %1.4s, %2.4s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
+-
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vabdq_f64 (float64x2_t a, float64x2_t b)
+-{
+- float64x2_t result;
+- __asm__ ("fabd %0.2d, %1.2d, %2.2d"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
+-
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdq_s8 (int8x16_t a, int8x16_t b)
+ {
+ int8x16_t result;
+@@ -5693,7 +6444,8 @@ vabdq_s8 (int8x16_t a, int8x16_t b)
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdq_s16 (int16x8_t a, int16x8_t b)
+ {
+ int16x8_t result;
+@@ -5704,7 +6456,8 @@ vabdq_s16 (int16x8_t a, int16x8_t b)
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdq_s32 (int32x4_t a, int32x4_t b)
+ {
+ int32x4_t result;
+@@ -5715,7 +6468,8 @@ vabdq_s32 (int32x4_t a, int32x4_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdq_u8 (uint8x16_t a, uint8x16_t b)
+ {
+ uint8x16_t result;
+@@ -5726,7 +6480,8 @@ vabdq_u8 (uint8x16_t a, uint8x16_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdq_u16 (uint16x8_t a, uint16x8_t b)
+ {
+ uint16x8_t result;
+@@ -5737,7 +6492,8 @@ vabdq_u16 (uint16x8_t a, uint16x8_t b)
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vabdq_u32 (uint32x4_t a, uint32x4_t b)
+ {
+ uint32x4_t result;
+@@ -5748,18 +6504,8 @@ vabdq_u32 (uint32x4_t a, uint32x4_t b)
+ return result;
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vabds_f32 (float32_t a, float32_t b)
+-{
+- float32_t result;
+- __asm__ ("fabd %s0, %s1, %s2"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
+-
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddlv_s8 (int8x8_t a)
+ {
+ int16_t result;
+@@ -5770,7 +6516,8 @@ vaddlv_s8 (int8x8_t a)
+ return result;
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddlv_s16 (int16x4_t a)
+ {
+ int32_t result;
+@@ -5781,7 +6528,8 @@ vaddlv_s16 (int16x4_t a)
+ return result;
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddlv_u8 (uint8x8_t a)
+ {
+ uint16_t result;
+@@ -5792,7 +6540,8 @@ vaddlv_u8 (uint8x8_t a)
+ return result;
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddlv_u16 (uint16x4_t a)
+ {
+ uint32_t result;
+@@ -5803,7 +6552,8 @@ vaddlv_u16 (uint16x4_t a)
+ return result;
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddlvq_s8 (int8x16_t a)
+ {
+ int16_t result;
+@@ -5814,7 +6564,8 @@ vaddlvq_s8 (int8x16_t a)
+ return result;
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddlvq_s16 (int16x8_t a)
+ {
+ int32_t result;
+@@ -5825,7 +6576,8 @@ vaddlvq_s16 (int16x8_t a)
+ return result;
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddlvq_s32 (int32x4_t a)
+ {
+ int64_t result;
+@@ -5836,7 +6588,8 @@ vaddlvq_s32 (int32x4_t a)
+ return result;
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddlvq_u8 (uint8x16_t a)
+ {
+ uint16_t result;
+@@ -5847,7 +6600,8 @@ vaddlvq_u8 (uint8x16_t a)
+ return result;
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddlvq_u16 (uint16x8_t a)
+ {
+ uint32_t result;
+@@ -5858,7 +6612,8 @@ vaddlvq_u16 (uint16x8_t a)
+ return result;
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+ vaddlvq_u32 (uint32x4_t a)
+ {
+ uint64_t result;
+@@ -5869,18584 +6624,22583 @@ vaddlvq_u32 (uint32x4_t a)
+ return result;
+ }
+
+-#define vcopyq_lane_f32(a, b, c, d) \
+- __extension__ \
+- ({ \
+- float32x4_t c_ = (c); \
+- float32x4_t a_ = (a); \
+- float32x4_t result; \
+- __asm__ ("ins %0.s[%2], %3.s[%4]" \
+- : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtx_f32_f64 (float64x2_t a)
++{
++ float32x2_t result;
++ __asm__ ("fcvtxn %0.2s,%1.2d"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcopyq_lane_f64(a, b, c, d) \
+- __extension__ \
+- ({ \
+- float64x2_t c_ = (c); \
+- float64x2_t a_ = (a); \
+- float64x2_t result; \
+- __asm__ ("ins %0.d[%2], %3.d[%4]" \
+- : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtx_high_f32_f64 (float32x2_t a, float64x2_t b)
++{
++ float32x4_t result;
++ __asm__ ("fcvtxn2 %0.4s,%1.2d"
++ : "=w"(result)
++ : "w" (b), "0"(a)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtxd_f32_f64 (float64_t a)
++{
++ float32_t result;
++ __asm__ ("fcvtxn %s0,%d1"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_n_f32 (float32x2_t a, float32x2_t b, float32_t c)
++{
++ float32x2_t result;
++ float32x2_t t1;
++ __asm__ ("fmul %1.2s, %3.2s, %4.s[0]; fadd %0.2s, %0.2s, %1.2s"
++ : "=w"(result), "=w"(t1)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_n_s16 (int16x4_t a, int16x4_t b, int16_t c)
++{
++ int16x4_t result;
++ __asm__ ("mla %0.4h,%2.4h,%3.h[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "x"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_n_s32 (int32x2_t a, int32x2_t b, int32_t c)
++{
++ int32x2_t result;
++ __asm__ ("mla %0.2s,%2.2s,%3.s[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_n_u16 (uint16x4_t a, uint16x4_t b, uint16_t c)
++{
++ uint16x4_t result;
++ __asm__ ("mla %0.4h,%2.4h,%3.h[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "x"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_n_u32 (uint32x2_t a, uint32x2_t b, uint32_t c)
++{
++ uint32x2_t result;
++ __asm__ ("mla %0.2s,%2.2s,%3.s[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_s8 (int8x8_t a, int8x8_t b, int8x8_t c)
++{
++ int8x8_t result;
++ __asm__ ("mla %0.8b, %2.8b, %3.8b"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_s16 (int16x4_t a, int16x4_t b, int16x4_t c)
++{
++ int16x4_t result;
++ __asm__ ("mla %0.4h, %2.4h, %3.4h"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_s32 (int32x2_t a, int32x2_t b, int32x2_t c)
++{
++ int32x2_t result;
++ __asm__ ("mla %0.2s, %2.2s, %3.2s"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_u8 (uint8x8_t a, uint8x8_t b, uint8x8_t c)
++{
++ uint8x8_t result;
++ __asm__ ("mla %0.8b, %2.8b, %3.8b"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_u16 (uint16x4_t a, uint16x4_t b, uint16x4_t c)
++{
++ uint16x4_t result;
++ __asm__ ("mla %0.4h, %2.4h, %3.4h"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_u32 (uint32x2_t a, uint32x2_t b, uint32x2_t c)
++{
++ uint32x2_t result;
++ __asm__ ("mla %0.2s, %2.2s, %3.2s"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcopyq_lane_p8(a, b, c, d) \
++#define vmlal_high_lane_s16(a, b, c, d) \
+ __extension__ \
+ ({ \
+- poly8x16_t c_ = (c); \
+- poly8x16_t a_ = (a); \
+- poly8x16_t result; \
+- __asm__ ("ins %0.b[%2], %3.b[%4]" \
++ int16x4_t c_ = (c); \
++ int16x8_t b_ = (b); \
++ int32x4_t a_ = (a); \
++ int32x4_t result; \
++ __asm__ ("smlal2 %0.4s, %2.8h, %3.h[%4]" \
+ : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
++ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcopyq_lane_p16(a, b, c, d) \
++#define vmlal_high_lane_s32(a, b, c, d) \
+ __extension__ \
+ ({ \
+- poly16x8_t c_ = (c); \
+- poly16x8_t a_ = (a); \
+- poly16x8_t result; \
+- __asm__ ("ins %0.h[%2], %3.h[%4]" \
++ int32x2_t c_ = (c); \
++ int32x4_t b_ = (b); \
++ int64x2_t a_ = (a); \
++ int64x2_t result; \
++ __asm__ ("smlal2 %0.2d, %2.4s, %3.s[%4]" \
+ : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
++ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcopyq_lane_s8(a, b, c, d) \
++#define vmlal_high_lane_u16(a, b, c, d) \
+ __extension__ \
+ ({ \
+- int8x16_t c_ = (c); \
+- int8x16_t a_ = (a); \
+- int8x16_t result; \
+- __asm__ ("ins %0.b[%2], %3.b[%4]" \
++ uint16x4_t c_ = (c); \
++ uint16x8_t b_ = (b); \
++ uint32x4_t a_ = (a); \
++ uint32x4_t result; \
++ __asm__ ("umlal2 %0.4s, %2.8h, %3.h[%4]" \
+ : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
++ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcopyq_lane_s16(a, b, c, d) \
++#define vmlal_high_lane_u32(a, b, c, d) \
+ __extension__ \
+ ({ \
+- int16x8_t c_ = (c); \
+- int16x8_t a_ = (a); \
+- int16x8_t result; \
+- __asm__ ("ins %0.h[%2], %3.h[%4]" \
++ uint32x2_t c_ = (c); \
++ uint32x4_t b_ = (b); \
++ uint64x2_t a_ = (a); \
++ uint64x2_t result; \
++ __asm__ ("umlal2 %0.2d, %2.4s, %3.s[%4]" \
+ : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
++ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcopyq_lane_s32(a, b, c, d) \
++#define vmlal_high_laneq_s16(a, b, c, d) \
+ __extension__ \
+ ({ \
+- int32x4_t c_ = (c); \
++ int16x8_t c_ = (c); \
++ int16x8_t b_ = (b); \
+ int32x4_t a_ = (a); \
+ int32x4_t result; \
+- __asm__ ("ins %0.s[%2], %3.s[%4]" \
++ __asm__ ("smlal2 %0.4s, %2.8h, %3.h[%4]" \
+ : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
++ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcopyq_lane_s64(a, b, c, d) \
++#define vmlal_high_laneq_s32(a, b, c, d) \
+ __extension__ \
+ ({ \
+- int64x2_t c_ = (c); \
++ int32x4_t c_ = (c); \
++ int32x4_t b_ = (b); \
+ int64x2_t a_ = (a); \
+ int64x2_t result; \
+- __asm__ ("ins %0.d[%2], %3.d[%4]" \
+- : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vcopyq_lane_u8(a, b, c, d) \
+- __extension__ \
+- ({ \
+- uint8x16_t c_ = (c); \
+- uint8x16_t a_ = (a); \
+- uint8x16_t result; \
+- __asm__ ("ins %0.b[%2], %3.b[%4]" \
++ __asm__ ("smlal2 %0.2d, %2.4s, %3.s[%4]" \
+ : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
++ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcopyq_lane_u16(a, b, c, d) \
++#define vmlal_high_laneq_u16(a, b, c, d) \
+ __extension__ \
+ ({ \
+ uint16x8_t c_ = (c); \
+- uint16x8_t a_ = (a); \
+- uint16x8_t result; \
+- __asm__ ("ins %0.h[%2], %3.h[%4]" \
++ uint16x8_t b_ = (b); \
++ uint32x4_t a_ = (a); \
++ uint32x4_t result; \
++ __asm__ ("umlal2 %0.4s, %2.8h, %3.h[%4]" \
+ : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
++ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcopyq_lane_u32(a, b, c, d) \
++#define vmlal_high_laneq_u32(a, b, c, d) \
+ __extension__ \
+ ({ \
+ uint32x4_t c_ = (c); \
+- uint32x4_t a_ = (a); \
+- uint32x4_t result; \
+- __asm__ ("ins %0.s[%2], %3.s[%4]" \
++ uint32x4_t b_ = (b); \
++ uint64x2_t a_ = (a); \
++ uint64x2_t result; \
++ __asm__ ("umlal2 %0.2d, %2.4s, %3.s[%4]" \
+ : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
++ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcopyq_lane_u64(a, b, c, d) \
+- __extension__ \
+- ({ \
+- uint64x2_t c_ = (c); \
+- uint64x2_t a_ = (a); \
+- uint64x2_t result; \
+- __asm__ ("ins %0.d[%2], %3.d[%4]" \
+- : "=w"(result) \
+- : "0"(a_), "i"(b), "w"(c_), "i"(d) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_high_n_s16 (int32x4_t a, int16x8_t b, int16_t c)
++{
++ int32x4_t result;
++ __asm__ ("smlal2 %0.4s,%2.8h,%3.h[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "x"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvt_n_f32_s32(a, b) \
+- __extension__ \
+- ({ \
+- int32x2_t a_ = (a); \
+- float32x2_t result; \
+- __asm__ ("scvtf %0.2s, %1.2s, #%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_high_n_s32 (int64x2_t a, int32x4_t b, int32_t c)
++{
++ int64x2_t result;
++ __asm__ ("smlal2 %0.2d,%2.4s,%3.s[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvt_n_f32_u32(a, b) \
+- __extension__ \
+- ({ \
+- uint32x2_t a_ = (a); \
+- float32x2_t result; \
+- __asm__ ("ucvtf %0.2s, %1.2s, #%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_high_n_u16 (uint32x4_t a, uint16x8_t b, uint16_t c)
++{
++ uint32x4_t result;
++ __asm__ ("umlal2 %0.4s,%2.8h,%3.h[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "x"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvt_n_s32_f32(a, b) \
+- __extension__ \
+- ({ \
+- float32x2_t a_ = (a); \
+- int32x2_t result; \
+- __asm__ ("fcvtzs %0.2s, %1.2s, #%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_high_n_u32 (uint64x2_t a, uint32x4_t b, uint32_t c)
++{
++ uint64x2_t result;
++ __asm__ ("umlal2 %0.2d,%2.4s,%3.s[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvt_n_u32_f32(a, b) \
+- __extension__ \
+- ({ \
+- float32x2_t a_ = (a); \
+- uint32x2_t result; \
+- __asm__ ("fcvtzu %0.2s, %1.2s, #%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_high_s8 (int16x8_t a, int8x16_t b, int8x16_t c)
++{
++ int16x8_t result;
++ __asm__ ("smlal2 %0.8h,%2.16b,%3.16b"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvtd_n_f64_s64(a, b) \
+- __extension__ \
+- ({ \
+- int64_t a_ = (a); \
+- float64_t result; \
+- __asm__ ("scvtf %d0,%d1,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_high_s16 (int32x4_t a, int16x8_t b, int16x8_t c)
++{
++ int32x4_t result;
++ __asm__ ("smlal2 %0.4s,%2.8h,%3.8h"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvtd_n_f64_u64(a, b) \
+- __extension__ \
+- ({ \
+- uint64_t a_ = (a); \
+- float64_t result; \
+- __asm__ ("ucvtf %d0,%d1,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_high_s32 (int64x2_t a, int32x4_t b, int32x4_t c)
++{
++ int64x2_t result;
++ __asm__ ("smlal2 %0.2d,%2.4s,%3.4s"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvtd_n_s64_f64(a, b) \
+- __extension__ \
+- ({ \
+- float64_t a_ = (a); \
+- int64_t result; \
+- __asm__ ("fcvtzs %d0,%d1,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_high_u8 (uint16x8_t a, uint8x16_t b, uint8x16_t c)
++{
++ uint16x8_t result;
++ __asm__ ("umlal2 %0.8h,%2.16b,%3.16b"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvtd_n_u64_f64(a, b) \
+- __extension__ \
+- ({ \
+- float64_t a_ = (a); \
+- uint64_t result; \
+- __asm__ ("fcvtzu %d0,%d1,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_high_u16 (uint32x4_t a, uint16x8_t b, uint16x8_t c)
++{
++ uint32x4_t result;
++ __asm__ ("umlal2 %0.4s,%2.8h,%3.8h"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_high_u32 (uint64x2_t a, uint32x4_t b, uint32x4_t c)
++{
++ uint64x2_t result;
++ __asm__ ("umlal2 %0.2d,%2.4s,%3.4s"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvtq_n_f32_s32(a, b) \
++#define vmlal_lane_s16(a, b, c, d) \
+ __extension__ \
+ ({ \
++ int16x4_t c_ = (c); \
++ int16x4_t b_ = (b); \
+ int32x4_t a_ = (a); \
+- float32x4_t result; \
+- __asm__ ("scvtf %0.4s, %1.4s, #%2" \
++ int32x4_t result; \
++ __asm__ ("smlal %0.4s,%2.4h,%3.h[%4]" \
+ : "=w"(result) \
+- : "w"(a_), "i"(b) \
++ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcvtq_n_f32_u32(a, b) \
++#define vmlal_lane_s32(a, b, c, d) \
+ __extension__ \
+ ({ \
+- uint32x4_t a_ = (a); \
+- float32x4_t result; \
+- __asm__ ("ucvtf %0.4s, %1.4s, #%2" \
++ int32x2_t c_ = (c); \
++ int32x2_t b_ = (b); \
++ int64x2_t a_ = (a); \
++ int64x2_t result; \
++ __asm__ ("smlal %0.2d,%2.2s,%3.s[%4]" \
+ : "=w"(result) \
+- : "w"(a_), "i"(b) \
++ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcvtq_n_f64_s64(a, b) \
++#define vmlal_lane_u16(a, b, c, d) \
+ __extension__ \
+ ({ \
+- int64x2_t a_ = (a); \
+- float64x2_t result; \
+- __asm__ ("scvtf %0.2d, %1.2d, #%2" \
++ uint16x4_t c_ = (c); \
++ uint16x4_t b_ = (b); \
++ uint32x4_t a_ = (a); \
++ uint32x4_t result; \
++ __asm__ ("umlal %0.4s,%2.4h,%3.h[%4]" \
+ : "=w"(result) \
+- : "w"(a_), "i"(b) \
++ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcvtq_n_f64_u64(a, b) \
++#define vmlal_lane_u32(a, b, c, d) \
+ __extension__ \
+ ({ \
++ uint32x2_t c_ = (c); \
++ uint32x2_t b_ = (b); \
+ uint64x2_t a_ = (a); \
+- float64x2_t result; \
+- __asm__ ("ucvtf %0.2d, %1.2d, #%2" \
++ uint64x2_t result; \
++ __asm__ ("umlal %0.2d, %2.2s, %3.s[%4]" \
+ : "=w"(result) \
+- : "w"(a_), "i"(b) \
++ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcvtq_n_s32_f32(a, b) \
++#define vmlal_laneq_s16(a, b, c, d) \
+ __extension__ \
+ ({ \
+- float32x4_t a_ = (a); \
++ int16x8_t c_ = (c); \
++ int16x4_t b_ = (b); \
++ int32x4_t a_ = (a); \
+ int32x4_t result; \
+- __asm__ ("fcvtzs %0.4s, %1.4s, #%2" \
++ __asm__ ("smlal %0.4s, %2.4h, %3.h[%4]" \
+ : "=w"(result) \
+- : "w"(a_), "i"(b) \
++ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcvtq_n_s64_f64(a, b) \
++#define vmlal_laneq_s32(a, b, c, d) \
+ __extension__ \
+ ({ \
+- float64x2_t a_ = (a); \
++ int32x4_t c_ = (c); \
++ int32x2_t b_ = (b); \
++ int64x2_t a_ = (a); \
+ int64x2_t result; \
+- __asm__ ("fcvtzs %0.2d, %1.2d, #%2" \
++ __asm__ ("smlal %0.2d, %2.2s, %3.s[%4]" \
+ : "=w"(result) \
+- : "w"(a_), "i"(b) \
++ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcvtq_n_u32_f32(a, b) \
++#define vmlal_laneq_u16(a, b, c, d) \
+ __extension__ \
+ ({ \
+- float32x4_t a_ = (a); \
++ uint16x8_t c_ = (c); \
++ uint16x4_t b_ = (b); \
++ uint32x4_t a_ = (a); \
+ uint32x4_t result; \
+- __asm__ ("fcvtzu %0.4s, %1.4s, #%2" \
++ __asm__ ("umlal %0.4s, %2.4h, %3.h[%4]" \
+ : "=w"(result) \
+- : "w"(a_), "i"(b) \
++ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcvtq_n_u64_f64(a, b) \
++#define vmlal_laneq_u32(a, b, c, d) \
+ __extension__ \
+ ({ \
+- float64x2_t a_ = (a); \
++ uint32x4_t c_ = (c); \
++ uint32x2_t b_ = (b); \
++ uint64x2_t a_ = (a); \
+ uint64x2_t result; \
+- __asm__ ("fcvtzu %0.2d, %1.2d, #%2" \
++ __asm__ ("umlal %0.2d, %2.2s, %3.s[%4]" \
+ : "=w"(result) \
+- : "w"(a_), "i"(b) \
++ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vcvts_n_f32_s32(a, b) \
+- __extension__ \
+- ({ \
+- int32_t a_ = (a); \
+- float32_t result; \
+- __asm__ ("scvtf %s0,%s1,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_n_s16 (int32x4_t a, int16x4_t b, int16_t c)
++{
++ int32x4_t result;
++ __asm__ ("smlal %0.4s,%2.4h,%3.h[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "x"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvts_n_f32_u32(a, b) \
+- __extension__ \
+- ({ \
+- uint32_t a_ = (a); \
+- float32_t result; \
+- __asm__ ("ucvtf %s0,%s1,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_n_s32 (int64x2_t a, int32x2_t b, int32_t c)
++{
++ int64x2_t result;
++ __asm__ ("smlal %0.2d,%2.2s,%3.s[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvts_n_s32_f32(a, b) \
+- __extension__ \
+- ({ \
+- float32_t a_ = (a); \
+- int32_t result; \
+- __asm__ ("fcvtzs %s0,%s1,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_n_u16 (uint32x4_t a, uint16x4_t b, uint16_t c)
++{
++ uint32x4_t result;
++ __asm__ ("umlal %0.4s,%2.4h,%3.h[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "x"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vcvts_n_u32_f32(a, b) \
+- __extension__ \
+- ({ \
+- float32_t a_ = (a); \
+- uint32_t result; \
+- __asm__ ("fcvtzu %s0,%s1,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_n_u32 (uint64x2_t a, uint32x2_t b, uint32_t c)
++{
++ uint64x2_t result;
++ __asm__ ("umlal %0.2d,%2.2s,%3.s[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vcvtx_f32_f64 (float64x2_t a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_s8 (int16x8_t a, int8x8_t b, int8x8_t c)
+ {
+- float32x2_t result;
+- __asm__ ("fcvtxn %0.2s,%1.2d"
++ int16x8_t result;
++ __asm__ ("smlal %0.8h,%2.8b,%3.8b"
+ : "=w"(result)
+- : "w"(a)
++ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vcvtx_high_f32_f64 (float32x2_t a, float64x2_t b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_s16 (int32x4_t a, int16x4_t b, int16x4_t c)
++{
++ int32x4_t result;
++ __asm__ ("smlal %0.4s,%2.4h,%3.4h"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_s32 (int64x2_t a, int32x2_t b, int32x2_t c)
++{
++ int64x2_t result;
++ __asm__ ("smlal %0.2d,%2.2s,%3.2s"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_u8 (uint16x8_t a, uint8x8_t b, uint8x8_t c)
++{
++ uint16x8_t result;
++ __asm__ ("umlal %0.8h,%2.8b,%3.8b"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_u16 (uint32x4_t a, uint16x4_t b, uint16x4_t c)
++{
++ uint32x4_t result;
++ __asm__ ("umlal %0.4s,%2.4h,%3.4h"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlal_u32 (uint64x2_t a, uint32x2_t b, uint32x2_t c)
++{
++ uint64x2_t result;
++ __asm__ ("umlal %0.2d,%2.2s,%3.2s"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_n_f32 (float32x4_t a, float32x4_t b, float32_t c)
+ {
+ float32x4_t result;
+- __asm__ ("fcvtxn2 %0.4s,%1.2d"
++ float32x4_t t1;
++ __asm__ ("fmul %1.4s, %3.4s, %4.s[0]; fadd %0.4s, %0.4s, %1.4s"
++ : "=w"(result), "=w"(t1)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_n_s16 (int16x8_t a, int16x8_t b, int16_t c)
++{
++ int16x8_t result;
++ __asm__ ("mla %0.8h,%2.8h,%3.h[0]"
+ : "=w"(result)
+- : "w" (b), "0"(a)
++ : "0"(a), "w"(b), "x"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vcvtxd_f32_f64 (float64_t a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_n_s32 (int32x4_t a, int32x4_t b, int32_t c)
+ {
+- float32_t result;
+- __asm__ ("fcvtxn %s0,%d1"
++ int32x4_t result;
++ __asm__ ("mla %0.4s,%2.4s,%3.s[0]"
+ : "=w"(result)
+- : "w"(a)
++ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmla_n_f32 (float32x2_t a, float32x2_t b, float32_t c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_n_u16 (uint16x8_t a, uint16x8_t b, uint16_t c)
++{
++ uint16x8_t result;
++ __asm__ ("mla %0.8h,%2.8h,%3.h[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "x"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_n_u32 (uint32x4_t a, uint32x4_t b, uint32_t c)
++{
++ uint32x4_t result;
++ __asm__ ("mla %0.4s,%2.4s,%3.s[0]"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_s8 (int8x16_t a, int8x16_t b, int8x16_t c)
++{
++ int8x16_t result;
++ __asm__ ("mla %0.16b, %2.16b, %3.16b"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_s16 (int16x8_t a, int16x8_t b, int16x8_t c)
++{
++ int16x8_t result;
++ __asm__ ("mla %0.8h, %2.8h, %3.8h"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_s32 (int32x4_t a, int32x4_t b, int32x4_t c)
++{
++ int32x4_t result;
++ __asm__ ("mla %0.4s, %2.4s, %3.4s"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_u8 (uint8x16_t a, uint8x16_t b, uint8x16_t c)
++{
++ uint8x16_t result;
++ __asm__ ("mla %0.16b, %2.16b, %3.16b"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_u16 (uint16x8_t a, uint16x8_t b, uint16x8_t c)
++{
++ uint16x8_t result;
++ __asm__ ("mla %0.8h, %2.8h, %3.8h"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t c)
++{
++ uint32x4_t result;
++ __asm__ ("mla %0.4s, %2.4s, %3.4s"
++ : "=w"(result)
++ : "0"(a), "w"(b), "w"(c)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_n_f32 (float32x2_t a, float32x2_t b, float32_t c)
+ {
+ float32x2_t result;
+ float32x2_t t1;
+- __asm__ ("fmul %1.2s, %3.2s, %4.s[0]; fadd %0.2s, %0.2s, %1.2s"
++ __asm__ ("fmul %1.2s, %3.2s, %4.s[0]; fsub %0.2s, %0.2s, %1.2s"
+ : "=w"(result), "=w"(t1)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmla_n_s16 (int16x4_t a, int16x4_t b, int16_t c)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_n_s16 (int16x4_t a, int16x4_t b, int16_t c)
+ {
+ int16x4_t result;
+- __asm__ ("mla %0.4h,%2.4h,%3.h[0]"
++ __asm__ ("mls %0.4h, %2.4h, %3.h[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "x"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmla_n_s32 (int32x2_t a, int32x2_t b, int32_t c)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_n_s32 (int32x2_t a, int32x2_t b, int32_t c)
+ {
+ int32x2_t result;
+- __asm__ ("mla %0.2s,%2.2s,%3.s[0]"
++ __asm__ ("mls %0.2s, %2.2s, %3.s[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmla_n_u16 (uint16x4_t a, uint16x4_t b, uint16_t c)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_n_u16 (uint16x4_t a, uint16x4_t b, uint16_t c)
+ {
+ uint16x4_t result;
+- __asm__ ("mla %0.4h,%2.4h,%3.h[0]"
++ __asm__ ("mls %0.4h, %2.4h, %3.h[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "x"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmla_n_u32 (uint32x2_t a, uint32x2_t b, uint32_t c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_n_u32 (uint32x2_t a, uint32x2_t b, uint32_t c)
+ {
+ uint32x2_t result;
+- __asm__ ("mla %0.2s,%2.2s,%3.s[0]"
++ __asm__ ("mls %0.2s, %2.2s, %3.s[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vmla_s8 (int8x8_t a, int8x8_t b, int8x8_t c)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_s8 (int8x8_t a, int8x8_t b, int8x8_t c)
+ {
+ int8x8_t result;
+- __asm__ ("mla %0.8b, %2.8b, %3.8b"
++ __asm__ ("mls %0.8b,%2.8b,%3.8b"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmla_s16 (int16x4_t a, int16x4_t b, int16x4_t c)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_s16 (int16x4_t a, int16x4_t b, int16x4_t c)
+ {
+ int16x4_t result;
+- __asm__ ("mla %0.4h, %2.4h, %3.4h"
++ __asm__ ("mls %0.4h,%2.4h,%3.4h"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmla_s32 (int32x2_t a, int32x2_t b, int32x2_t c)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_s32 (int32x2_t a, int32x2_t b, int32x2_t c)
+ {
+ int32x2_t result;
+- __asm__ ("mla %0.2s, %2.2s, %3.2s"
++ __asm__ ("mls %0.2s,%2.2s,%3.2s"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vmla_u8 (uint8x8_t a, uint8x8_t b, uint8x8_t c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_u8 (uint8x8_t a, uint8x8_t b, uint8x8_t c)
+ {
+ uint8x8_t result;
+- __asm__ ("mla %0.8b, %2.8b, %3.8b"
++ __asm__ ("mls %0.8b,%2.8b,%3.8b"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmla_u16 (uint16x4_t a, uint16x4_t b, uint16x4_t c)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_u16 (uint16x4_t a, uint16x4_t b, uint16x4_t c)
+ {
+ uint16x4_t result;
+- __asm__ ("mla %0.4h, %2.4h, %3.4h"
++ __asm__ ("mls %0.4h,%2.4h,%3.4h"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmla_u32 (uint32x2_t a, uint32x2_t b, uint32x2_t c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_u32 (uint32x2_t a, uint32x2_t b, uint32x2_t c)
+ {
+ uint32x2_t result;
+- __asm__ ("mla %0.2s, %2.2s, %3.2s"
++ __asm__ ("mls %0.2s,%2.2s,%3.2s"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-#define vmlal_high_lane_s16(a, b, c, d) \
++#define vmlsl_high_lane_s16(a, b, c, d) \
+ __extension__ \
+ ({ \
+ int16x4_t c_ = (c); \
+ int16x8_t b_ = (b); \
+ int32x4_t a_ = (a); \
+ int32x4_t result; \
+- __asm__ ("smlal2 %0.4s, %2.8h, %3.h[%4]" \
++ __asm__ ("smlsl2 %0.4s, %2.8h, %3.h[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_high_lane_s32(a, b, c, d) \
++#define vmlsl_high_lane_s32(a, b, c, d) \
+ __extension__ \
+ ({ \
+ int32x2_t c_ = (c); \
+ int32x4_t b_ = (b); \
+ int64x2_t a_ = (a); \
+ int64x2_t result; \
+- __asm__ ("smlal2 %0.2d, %2.4s, %3.s[%4]" \
++ __asm__ ("smlsl2 %0.2d, %2.4s, %3.s[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_high_lane_u16(a, b, c, d) \
++#define vmlsl_high_lane_u16(a, b, c, d) \
+ __extension__ \
+ ({ \
+ uint16x4_t c_ = (c); \
+ uint16x8_t b_ = (b); \
+ uint32x4_t a_ = (a); \
+ uint32x4_t result; \
+- __asm__ ("umlal2 %0.4s, %2.8h, %3.h[%4]" \
++ __asm__ ("umlsl2 %0.4s, %2.8h, %3.h[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_high_lane_u32(a, b, c, d) \
++#define vmlsl_high_lane_u32(a, b, c, d) \
+ __extension__ \
+ ({ \
+ uint32x2_t c_ = (c); \
+ uint32x4_t b_ = (b); \
+ uint64x2_t a_ = (a); \
+ uint64x2_t result; \
+- __asm__ ("umlal2 %0.2d, %2.4s, %3.s[%4]" \
++ __asm__ ("umlsl2 %0.2d, %2.4s, %3.s[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_high_laneq_s16(a, b, c, d) \
++#define vmlsl_high_laneq_s16(a, b, c, d) \
+ __extension__ \
+ ({ \
+ int16x8_t c_ = (c); \
+ int16x8_t b_ = (b); \
+ int32x4_t a_ = (a); \
+ int32x4_t result; \
+- __asm__ ("smlal2 %0.4s, %2.8h, %3.h[%4]" \
++ __asm__ ("smlsl2 %0.4s, %2.8h, %3.h[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_high_laneq_s32(a, b, c, d) \
++#define vmlsl_high_laneq_s32(a, b, c, d) \
+ __extension__ \
+ ({ \
+ int32x4_t c_ = (c); \
+ int32x4_t b_ = (b); \
+ int64x2_t a_ = (a); \
+ int64x2_t result; \
+- __asm__ ("smlal2 %0.2d, %2.4s, %3.s[%4]" \
++ __asm__ ("smlsl2 %0.2d, %2.4s, %3.s[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_high_laneq_u16(a, b, c, d) \
++#define vmlsl_high_laneq_u16(a, b, c, d) \
+ __extension__ \
+ ({ \
+ uint16x8_t c_ = (c); \
+ uint16x8_t b_ = (b); \
+ uint32x4_t a_ = (a); \
+ uint32x4_t result; \
+- __asm__ ("umlal2 %0.4s, %2.8h, %3.h[%4]" \
++ __asm__ ("umlsl2 %0.4s, %2.8h, %3.h[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_high_laneq_u32(a, b, c, d) \
++#define vmlsl_high_laneq_u32(a, b, c, d) \
+ __extension__ \
+ ({ \
+ uint32x4_t c_ = (c); \
+ uint32x4_t b_ = (b); \
+ uint64x2_t a_ = (a); \
+ uint64x2_t result; \
+- __asm__ ("umlal2 %0.2d, %2.4s, %3.s[%4]" \
++ __asm__ ("umlsl2 %0.2d, %2.4s, %3.s[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlal_high_n_s16 (int32x4_t a, int16x8_t b, int16_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_high_n_s16 (int32x4_t a, int16x8_t b, int16_t c)
+ {
+ int32x4_t result;
+- __asm__ ("smlal2 %0.4s,%2.8h,%3.h[0]"
++ __asm__ ("smlsl2 %0.4s, %2.8h, %3.h[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "x"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmlal_high_n_s32 (int64x2_t a, int32x4_t b, int32_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_high_n_s32 (int64x2_t a, int32x4_t b, int32_t c)
+ {
+ int64x2_t result;
+- __asm__ ("smlal2 %0.2d,%2.4s,%3.s[0]"
++ __asm__ ("smlsl2 %0.2d, %2.4s, %3.s[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlal_high_n_u16 (uint32x4_t a, uint16x8_t b, uint16_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_high_n_u16 (uint32x4_t a, uint16x8_t b, uint16_t c)
+ {
+ uint32x4_t result;
+- __asm__ ("umlal2 %0.4s,%2.8h,%3.h[0]"
++ __asm__ ("umlsl2 %0.4s, %2.8h, %3.h[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "x"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmlal_high_n_u32 (uint64x2_t a, uint32x4_t b, uint32_t c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_high_n_u32 (uint64x2_t a, uint32x4_t b, uint32_t c)
+ {
+ uint64x2_t result;
+- __asm__ ("umlal2 %0.2d,%2.4s,%3.s[0]"
++ __asm__ ("umlsl2 %0.2d, %2.4s, %3.s[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlal_high_s8 (int16x8_t a, int8x16_t b, int8x16_t c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_high_s8 (int16x8_t a, int8x16_t b, int8x16_t c)
+ {
+ int16x8_t result;
+- __asm__ ("smlal2 %0.8h,%2.16b,%3.16b"
++ __asm__ ("smlsl2 %0.8h,%2.16b,%3.16b"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlal_high_s16 (int32x4_t a, int16x8_t b, int16x8_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_high_s16 (int32x4_t a, int16x8_t b, int16x8_t c)
+ {
+ int32x4_t result;
+- __asm__ ("smlal2 %0.4s,%2.8h,%3.8h"
++ __asm__ ("smlsl2 %0.4s,%2.8h,%3.8h"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmlal_high_s32 (int64x2_t a, int32x4_t b, int32x4_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_high_s32 (int64x2_t a, int32x4_t b, int32x4_t c)
+ {
+ int64x2_t result;
+- __asm__ ("smlal2 %0.2d,%2.4s,%3.4s"
++ __asm__ ("smlsl2 %0.2d,%2.4s,%3.4s"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlal_high_u8 (uint16x8_t a, uint8x16_t b, uint8x16_t c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_high_u8 (uint16x8_t a, uint8x16_t b, uint8x16_t c)
+ {
+ uint16x8_t result;
+- __asm__ ("umlal2 %0.8h,%2.16b,%3.16b"
++ __asm__ ("umlsl2 %0.8h,%2.16b,%3.16b"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlal_high_u16 (uint32x4_t a, uint16x8_t b, uint16x8_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_high_u16 (uint32x4_t a, uint16x8_t b, uint16x8_t c)
+ {
+ uint32x4_t result;
+- __asm__ ("umlal2 %0.4s,%2.8h,%3.8h"
++ __asm__ ("umlsl2 %0.4s,%2.8h,%3.8h"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmlal_high_u32 (uint64x2_t a, uint32x4_t b, uint32x4_t c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_high_u32 (uint64x2_t a, uint32x4_t b, uint32x4_t c)
+ {
+ uint64x2_t result;
+- __asm__ ("umlal2 %0.2d,%2.4s,%3.4s"
++ __asm__ ("umlsl2 %0.2d,%2.4s,%3.4s"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-#define vmlal_lane_s16(a, b, c, d) \
++#define vmlsl_lane_s16(a, b, c, d) \
+ __extension__ \
+ ({ \
+ int16x4_t c_ = (c); \
+ int16x4_t b_ = (b); \
+ int32x4_t a_ = (a); \
+ int32x4_t result; \
+- __asm__ ("smlal %0.4s,%2.4h,%3.h[%4]" \
++ __asm__ ("smlsl %0.4s, %2.4h, %3.h[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_lane_s32(a, b, c, d) \
++#define vmlsl_lane_s32(a, b, c, d) \
+ __extension__ \
+ ({ \
+ int32x2_t c_ = (c); \
+ int32x2_t b_ = (b); \
+ int64x2_t a_ = (a); \
+ int64x2_t result; \
+- __asm__ ("smlal %0.2d,%2.2s,%3.s[%4]" \
++ __asm__ ("smlsl %0.2d, %2.2s, %3.s[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_lane_u16(a, b, c, d) \
++#define vmlsl_lane_u16(a, b, c, d) \
+ __extension__ \
+ ({ \
+ uint16x4_t c_ = (c); \
+ uint16x4_t b_ = (b); \
+ uint32x4_t a_ = (a); \
+ uint32x4_t result; \
+- __asm__ ("umlal %0.4s,%2.4h,%3.h[%4]" \
++ __asm__ ("umlsl %0.4s, %2.4h, %3.h[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_lane_u32(a, b, c, d) \
++#define vmlsl_lane_u32(a, b, c, d) \
+ __extension__ \
+ ({ \
+ uint32x2_t c_ = (c); \
+ uint32x2_t b_ = (b); \
+ uint64x2_t a_ = (a); \
+ uint64x2_t result; \
+- __asm__ ("umlal %0.2d, %2.2s, %3.s[%4]" \
++ __asm__ ("umlsl %0.2d, %2.2s, %3.s[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_laneq_s16(a, b, c, d) \
++#define vmlsl_laneq_s16(a, b, c, d) \
+ __extension__ \
+ ({ \
+ int16x8_t c_ = (c); \
+ int16x4_t b_ = (b); \
+ int32x4_t a_ = (a); \
+ int32x4_t result; \
+- __asm__ ("smlal %0.4s, %2.4h, %3.h[%4]" \
++ __asm__ ("smlsl %0.4s, %2.4h, %3.h[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_laneq_s32(a, b, c, d) \
++#define vmlsl_laneq_s32(a, b, c, d) \
+ __extension__ \
+ ({ \
+ int32x4_t c_ = (c); \
+ int32x2_t b_ = (b); \
+ int64x2_t a_ = (a); \
+ int64x2_t result; \
+- __asm__ ("smlal %0.2d, %2.2s, %3.s[%4]" \
++ __asm__ ("smlsl %0.2d, %2.2s, %3.s[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_laneq_u16(a, b, c, d) \
++#define vmlsl_laneq_u16(a, b, c, d) \
+ __extension__ \
+ ({ \
+ uint16x8_t c_ = (c); \
+ uint16x4_t b_ = (b); \
+ uint32x4_t a_ = (a); \
+ uint32x4_t result; \
+- __asm__ ("umlal %0.4s, %2.4h, %3.h[%4]" \
++ __asm__ ("umlsl %0.4s, %2.4h, %3.h[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlal_laneq_u32(a, b, c, d) \
++#define vmlsl_laneq_u32(a, b, c, d) \
+ __extension__ \
+ ({ \
+ uint32x4_t c_ = (c); \
+ uint32x2_t b_ = (b); \
+ uint64x2_t a_ = (a); \
+ uint64x2_t result; \
+- __asm__ ("umlal %0.2d, %2.2s, %3.s[%4]" \
++ __asm__ ("umlsl %0.2d, %2.2s, %3.s[%4]" \
+ : "=w"(result) \
+ : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlal_n_s16 (int32x4_t a, int16x4_t b, int16_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_n_s16 (int32x4_t a, int16x4_t b, int16_t c)
+ {
+ int32x4_t result;
+- __asm__ ("smlal %0.4s,%2.4h,%3.h[0]"
++ __asm__ ("smlsl %0.4s, %2.4h, %3.h[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "x"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmlal_n_s32 (int64x2_t a, int32x2_t b, int32_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_n_s32 (int64x2_t a, int32x2_t b, int32_t c)
+ {
+ int64x2_t result;
+- __asm__ ("smlal %0.2d,%2.2s,%3.s[0]"
++ __asm__ ("smlsl %0.2d, %2.2s, %3.s[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlal_n_u16 (uint32x4_t a, uint16x4_t b, uint16_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_n_u16 (uint32x4_t a, uint16x4_t b, uint16_t c)
+ {
+ uint32x4_t result;
+- __asm__ ("umlal %0.4s,%2.4h,%3.h[0]"
++ __asm__ ("umlsl %0.4s, %2.4h, %3.h[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "x"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmlal_n_u32 (uint64x2_t a, uint32x2_t b, uint32_t c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_n_u32 (uint64x2_t a, uint32x2_t b, uint32_t c)
+ {
+ uint64x2_t result;
+- __asm__ ("umlal %0.2d,%2.2s,%3.s[0]"
++ __asm__ ("umlsl %0.2d, %2.2s, %3.s[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlal_s8 (int16x8_t a, int8x8_t b, int8x8_t c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_s8 (int16x8_t a, int8x8_t b, int8x8_t c)
+ {
+ int16x8_t result;
+- __asm__ ("smlal %0.8h,%2.8b,%3.8b"
++ __asm__ ("smlsl %0.8h, %2.8b, %3.8b"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlal_s16 (int32x4_t a, int16x4_t b, int16x4_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_s16 (int32x4_t a, int16x4_t b, int16x4_t c)
+ {
+ int32x4_t result;
+- __asm__ ("smlal %0.4s,%2.4h,%3.4h"
++ __asm__ ("smlsl %0.4s, %2.4h, %3.4h"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmlal_s32 (int64x2_t a, int32x2_t b, int32x2_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_s32 (int64x2_t a, int32x2_t b, int32x2_t c)
+ {
+ int64x2_t result;
+- __asm__ ("smlal %0.2d,%2.2s,%3.2s"
++ __asm__ ("smlsl %0.2d, %2.2s, %3.2s"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlal_u8 (uint16x8_t a, uint8x8_t b, uint8x8_t c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_u8 (uint16x8_t a, uint8x8_t b, uint8x8_t c)
+ {
+ uint16x8_t result;
+- __asm__ ("umlal %0.8h,%2.8b,%3.8b"
++ __asm__ ("umlsl %0.8h, %2.8b, %3.8b"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlal_u16 (uint32x4_t a, uint16x4_t b, uint16x4_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_u16 (uint32x4_t a, uint16x4_t b, uint16x4_t c)
+ {
+ uint32x4_t result;
+- __asm__ ("umlal %0.4s,%2.4h,%3.4h"
++ __asm__ ("umlsl %0.4s, %2.4h, %3.4h"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmlal_u32 (uint64x2_t a, uint32x2_t b, uint32x2_t c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsl_u32 (uint64x2_t a, uint32x2_t b, uint32x2_t c)
+ {
+ uint64x2_t result;
+- __asm__ ("umlal %0.2d,%2.2s,%3.2s"
++ __asm__ ("umlsl %0.2d, %2.2s, %3.2s"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmlaq_n_f32 (float32x4_t a, float32x4_t b, float32_t c)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_n_f32 (float32x4_t a, float32x4_t b, float32_t c)
+ {
+ float32x4_t result;
+ float32x4_t t1;
+- __asm__ ("fmul %1.4s, %3.4s, %4.s[0]; fadd %0.4s, %0.4s, %1.4s"
++ __asm__ ("fmul %1.4s, %3.4s, %4.s[0]; fsub %0.4s, %0.4s, %1.4s"
+ : "=w"(result), "=w"(t1)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlaq_n_s16 (int16x8_t a, int16x8_t b, int16_t c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_n_s16 (int16x8_t a, int16x8_t b, int16_t c)
+ {
+ int16x8_t result;
+- __asm__ ("mla %0.8h,%2.8h,%3.h[0]"
++ __asm__ ("mls %0.8h, %2.8h, %3.h[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "x"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlaq_n_s32 (int32x4_t a, int32x4_t b, int32_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_n_s32 (int32x4_t a, int32x4_t b, int32_t c)
+ {
+ int32x4_t result;
+- __asm__ ("mla %0.4s,%2.4s,%3.s[0]"
++ __asm__ ("mls %0.4s, %2.4s, %3.s[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlaq_n_u16 (uint16x8_t a, uint16x8_t b, uint16_t c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_n_u16 (uint16x8_t a, uint16x8_t b, uint16_t c)
+ {
+ uint16x8_t result;
+- __asm__ ("mla %0.8h,%2.8h,%3.h[0]"
++ __asm__ ("mls %0.8h, %2.8h, %3.h[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "x"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlaq_n_u32 (uint32x4_t a, uint32x4_t b, uint32_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_n_u32 (uint32x4_t a, uint32x4_t b, uint32_t c)
+ {
+ uint32x4_t result;
+- __asm__ ("mla %0.4s,%2.4s,%3.s[0]"
++ __asm__ ("mls %0.4s, %2.4s, %3.s[0]"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vmlaq_s8 (int8x16_t a, int8x16_t b, int8x16_t c)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_s8 (int8x16_t a, int8x16_t b, int8x16_t c)
+ {
+ int8x16_t result;
+- __asm__ ("mla %0.16b, %2.16b, %3.16b"
++ __asm__ ("mls %0.16b,%2.16b,%3.16b"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlaq_s16 (int16x8_t a, int16x8_t b, int16x8_t c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_s16 (int16x8_t a, int16x8_t b, int16x8_t c)
+ {
+ int16x8_t result;
+- __asm__ ("mla %0.8h, %2.8h, %3.8h"
++ __asm__ ("mls %0.8h,%2.8h,%3.8h"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlaq_s32 (int32x4_t a, int32x4_t b, int32x4_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_s32 (int32x4_t a, int32x4_t b, int32x4_t c)
+ {
+ int32x4_t result;
+- __asm__ ("mla %0.4s, %2.4s, %3.4s"
++ __asm__ ("mls %0.4s,%2.4s,%3.4s"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vmlaq_u8 (uint8x16_t a, uint8x16_t b, uint8x16_t c)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_u8 (uint8x16_t a, uint8x16_t b, uint8x16_t c)
+ {
+ uint8x16_t result;
+- __asm__ ("mla %0.16b, %2.16b, %3.16b"
++ __asm__ ("mls %0.16b,%2.16b,%3.16b"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlaq_u16 (uint16x8_t a, uint16x8_t b, uint16x8_t c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_u16 (uint16x8_t a, uint16x8_t b, uint16x8_t c)
+ {
+ uint16x8_t result;
+- __asm__ ("mla %0.8h, %2.8h, %3.8h"
++ __asm__ ("mls %0.8h,%2.8h,%3.8h"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlaq_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t c)
+ {
+ uint32x4_t result;
+- __asm__ ("mla %0.4s, %2.4s, %3.4s"
++ __asm__ ("mls %0.4s,%2.4s,%3.4s"
+ : "=w"(result)
+ : "0"(a), "w"(b), "w"(c)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmls_n_f32 (float32x2_t a, float32x2_t b, float32_t c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_high_s8 (int8x16_t a)
+ {
+- float32x2_t result;
+- float32x2_t t1;
+- __asm__ ("fmul %1.2s, %3.2s, %4.s[0]; fsub %0.2s, %0.2s, %1.2s"
+- : "=w"(result), "=w"(t1)
+- : "0"(a), "w"(b), "w"(c)
++ int16x8_t result;
++ __asm__ ("sshll2 %0.8h,%1.16b,#0"
++ : "=w"(result)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmls_n_s16 (int16x4_t a, int16x4_t b, int16_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_high_s16 (int16x8_t a)
+ {
+- int16x4_t result;
+- __asm__ ("mls %0.4h, %2.4h, %3.h[0]"
++ int32x4_t result;
++ __asm__ ("sshll2 %0.4s,%1.8h,#0"
+ : "=w"(result)
+- : "0"(a), "w"(b), "x"(c)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmls_n_s32 (int32x2_t a, int32x2_t b, int32_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_high_s32 (int32x4_t a)
+ {
+- int32x2_t result;
+- __asm__ ("mls %0.2s, %2.2s, %3.s[0]"
++ int64x2_t result;
++ __asm__ ("sshll2 %0.2d,%1.4s,#0"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmls_n_u16 (uint16x4_t a, uint16x4_t b, uint16_t c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_high_u8 (uint8x16_t a)
+ {
+- uint16x4_t result;
+- __asm__ ("mls %0.4h, %2.4h, %3.h[0]"
++ uint16x8_t result;
++ __asm__ ("ushll2 %0.8h,%1.16b,#0"
+ : "=w"(result)
+- : "0"(a), "w"(b), "x"(c)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmls_n_u32 (uint32x2_t a, uint32x2_t b, uint32_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_high_u16 (uint16x8_t a)
+ {
+- uint32x2_t result;
+- __asm__ ("mls %0.2s, %2.2s, %3.s[0]"
++ uint32x4_t result;
++ __asm__ ("ushll2 %0.4s,%1.8h,#0"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vmls_s8 (int8x8_t a, int8x8_t b, int8x8_t c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_high_u32 (uint32x4_t a)
+ {
+- int8x8_t result;
+- __asm__ ("mls %0.8b,%2.8b,%3.8b"
++ uint64x2_t result;
++ __asm__ ("ushll2 %0.2d,%1.4s,#0"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmls_s16 (int16x4_t a, int16x4_t b, int16x4_t c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_s8 (int8x8_t a)
+ {
+- int16x4_t result;
+- __asm__ ("mls %0.4h,%2.4h,%3.4h"
++ int16x8_t result;
++ __asm__ ("sshll %0.8h,%1.8b,#0"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmls_s32 (int32x2_t a, int32x2_t b, int32x2_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_s16 (int16x4_t a)
+ {
+- int32x2_t result;
+- __asm__ ("mls %0.2s,%2.2s,%3.2s"
++ int32x4_t result;
++ __asm__ ("sshll %0.4s,%1.4h,#0"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vmls_u8 (uint8x8_t a, uint8x8_t b, uint8x8_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_s32 (int32x2_t a)
+ {
+- uint8x8_t result;
+- __asm__ ("mls %0.8b,%2.8b,%3.8b"
++ int64x2_t result;
++ __asm__ ("sshll %0.2d,%1.2s,#0"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmls_u16 (uint16x4_t a, uint16x4_t b, uint16x4_t c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_u8 (uint8x8_t a)
+ {
+- uint16x4_t result;
+- __asm__ ("mls %0.4h,%2.4h,%3.4h"
++ uint16x8_t result;
++ __asm__ ("ushll %0.8h,%1.8b,#0"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmls_u32 (uint32x2_t a, uint32x2_t b, uint32x2_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_u16 (uint16x4_t a)
+ {
+- uint32x2_t result;
+- __asm__ ("mls %0.2s,%2.2s,%3.2s"
++ uint32x4_t result;
++ __asm__ ("ushll %0.4s,%1.4h,#0"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-#define vmlsl_high_lane_s16(a, b, c, d) \
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovl_u32 (uint32x2_t a)
++{
++ uint64x2_t result;
++ __asm__ ("ushll %0.2d,%1.2s,#0"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_high_s16 (int8x8_t a, int16x8_t b)
++{
++ int8x16_t result = vcombine_s8 (a, vcreate_s8 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("xtn2 %0.16b,%1.8h"
++ : "+w"(result)
++ : "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_high_s32 (int16x4_t a, int32x4_t b)
++{
++ int16x8_t result = vcombine_s16 (a, vcreate_s16 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("xtn2 %0.8h,%1.4s"
++ : "+w"(result)
++ : "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_high_s64 (int32x2_t a, int64x2_t b)
++{
++ int32x4_t result = vcombine_s32 (a, vcreate_s32 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("xtn2 %0.4s,%1.2d"
++ : "+w"(result)
++ : "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_high_u16 (uint8x8_t a, uint16x8_t b)
++{
++ uint8x16_t result = vcombine_u8 (a, vcreate_u8 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("xtn2 %0.16b,%1.8h"
++ : "+w"(result)
++ : "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_high_u32 (uint16x4_t a, uint32x4_t b)
++{
++ uint16x8_t result = vcombine_u16 (a, vcreate_u16 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("xtn2 %0.8h,%1.4s"
++ : "+w"(result)
++ : "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_high_u64 (uint32x2_t a, uint64x2_t b)
++{
++ uint32x4_t result = vcombine_u32 (a, vcreate_u32 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("xtn2 %0.4s,%1.2d"
++ : "+w"(result)
++ : "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_s16 (int16x8_t a)
++{
++ int8x8_t result;
++ __asm__ ("xtn %0.8b,%1.8h"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_s32 (int32x4_t a)
++{
++ int16x4_t result;
++ __asm__ ("xtn %0.4h,%1.4s"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_s64 (int64x2_t a)
++{
++ int32x2_t result;
++ __asm__ ("xtn %0.2s,%1.2d"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_u16 (uint16x8_t a)
++{
++ uint8x8_t result;
++ __asm__ ("xtn %0.8b,%1.8h"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_u32 (uint32x4_t a)
++{
++ uint16x4_t result;
++ __asm__ ("xtn %0.4h,%1.4s"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovn_u64 (uint64x2_t a)
++{
++ uint32x2_t result;
++ __asm__ ("xtn %0.2s,%1.2d"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers */);
++ return result;
++}
++
++#define vmull_high_lane_s16(a, b, c) \
+ __extension__ \
+ ({ \
+- int16x4_t c_ = (c); \
+- int16x8_t b_ = (b); \
+- int32x4_t a_ = (a); \
++ int16x4_t b_ = (b); \
++ int16x8_t a_ = (a); \
+ int32x4_t result; \
+- __asm__ ("smlsl2 %0.4s, %2.8h, %3.h[%4]" \
++ __asm__ ("smull2 %0.4s, %1.8h, %2.h[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
++ : "w"(a_), "x"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_high_lane_s32(a, b, c, d) \
++#define vmull_high_lane_s32(a, b, c) \
+ __extension__ \
+ ({ \
+- int32x2_t c_ = (c); \
+- int32x4_t b_ = (b); \
+- int64x2_t a_ = (a); \
++ int32x2_t b_ = (b); \
++ int32x4_t a_ = (a); \
+ int64x2_t result; \
+- __asm__ ("smlsl2 %0.2d, %2.4s, %3.s[%4]" \
++ __asm__ ("smull2 %0.2d, %1.4s, %2.s[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
++ : "w"(a_), "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_high_lane_u16(a, b, c, d) \
++#define vmull_high_lane_u16(a, b, c) \
+ __extension__ \
+ ({ \
+- uint16x4_t c_ = (c); \
+- uint16x8_t b_ = (b); \
+- uint32x4_t a_ = (a); \
++ uint16x4_t b_ = (b); \
++ uint16x8_t a_ = (a); \
+ uint32x4_t result; \
+- __asm__ ("umlsl2 %0.4s, %2.8h, %3.h[%4]" \
++ __asm__ ("umull2 %0.4s, %1.8h, %2.h[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
++ : "w"(a_), "x"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_high_lane_u32(a, b, c, d) \
++#define vmull_high_lane_u32(a, b, c) \
+ __extension__ \
+ ({ \
+- uint32x2_t c_ = (c); \
+- uint32x4_t b_ = (b); \
+- uint64x2_t a_ = (a); \
++ uint32x2_t b_ = (b); \
++ uint32x4_t a_ = (a); \
+ uint64x2_t result; \
+- __asm__ ("umlsl2 %0.2d, %2.4s, %3.s[%4]" \
++ __asm__ ("umull2 %0.2d, %1.4s, %2.s[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
++ : "w"(a_), "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_high_laneq_s16(a, b, c, d) \
++#define vmull_high_laneq_s16(a, b, c) \
+ __extension__ \
+ ({ \
+- int16x8_t c_ = (c); \
+ int16x8_t b_ = (b); \
+- int32x4_t a_ = (a); \
++ int16x8_t a_ = (a); \
+ int32x4_t result; \
+- __asm__ ("smlsl2 %0.4s, %2.8h, %3.h[%4]" \
++ __asm__ ("smull2 %0.4s, %1.8h, %2.h[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
++ : "w"(a_), "x"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_high_laneq_s32(a, b, c, d) \
++#define vmull_high_laneq_s32(a, b, c) \
+ __extension__ \
+ ({ \
+- int32x4_t c_ = (c); \
+ int32x4_t b_ = (b); \
+- int64x2_t a_ = (a); \
++ int32x4_t a_ = (a); \
+ int64x2_t result; \
+- __asm__ ("smlsl2 %0.2d, %2.4s, %3.s[%4]" \
++ __asm__ ("smull2 %0.2d, %1.4s, %2.s[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
++ : "w"(a_), "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_high_laneq_u16(a, b, c, d) \
++#define vmull_high_laneq_u16(a, b, c) \
+ __extension__ \
+ ({ \
+- uint16x8_t c_ = (c); \
+ uint16x8_t b_ = (b); \
+- uint32x4_t a_ = (a); \
++ uint16x8_t a_ = (a); \
+ uint32x4_t result; \
+- __asm__ ("umlsl2 %0.4s, %2.8h, %3.h[%4]" \
++ __asm__ ("umull2 %0.4s, %1.8h, %2.h[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
++ : "w"(a_), "x"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_high_laneq_u32(a, b, c, d) \
++#define vmull_high_laneq_u32(a, b, c) \
+ __extension__ \
+ ({ \
+- uint32x4_t c_ = (c); \
+ uint32x4_t b_ = (b); \
+- uint64x2_t a_ = (a); \
++ uint32x4_t a_ = (a); \
+ uint64x2_t result; \
+- __asm__ ("umlsl2 %0.2d, %2.4s, %3.s[%4]" \
++ __asm__ ("umull2 %0.2d, %1.4s, %2.s[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
++ : "w"(a_), "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlsl_high_n_s16 (int32x4_t a, int16x8_t b, int16_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_n_s16 (int16x8_t a, int16_t b)
+ {
+ int32x4_t result;
+- __asm__ ("smlsl2 %0.4s, %2.8h, %3.h[0]"
++ __asm__ ("smull2 %0.4s,%1.8h,%2.h[0]"
+ : "=w"(result)
+- : "0"(a), "w"(b), "x"(c)
++ : "w"(a), "x"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmlsl_high_n_s32 (int64x2_t a, int32x4_t b, int32_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_n_s32 (int32x4_t a, int32_t b)
+ {
+ int64x2_t result;
+- __asm__ ("smlsl2 %0.2d, %2.4s, %3.s[0]"
++ __asm__ ("smull2 %0.2d,%1.4s,%2.s[0]"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlsl_high_n_u16 (uint32x4_t a, uint16x8_t b, uint16_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_n_u16 (uint16x8_t a, uint16_t b)
+ {
+ uint32x4_t result;
+- __asm__ ("umlsl2 %0.4s, %2.8h, %3.h[0]"
++ __asm__ ("umull2 %0.4s,%1.8h,%2.h[0]"
+ : "=w"(result)
+- : "0"(a), "w"(b), "x"(c)
++ : "w"(a), "x"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmlsl_high_n_u32 (uint64x2_t a, uint32x4_t b, uint32_t c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_n_u32 (uint32x4_t a, uint32_t b)
+ {
+ uint64x2_t result;
+- __asm__ ("umlsl2 %0.2d, %2.4s, %3.s[0]"
++ __asm__ ("umull2 %0.2d,%1.4s,%2.s[0]"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlsl_high_s8 (int16x8_t a, int8x16_t b, int8x16_t c)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_p8 (poly8x16_t a, poly8x16_t b)
+ {
+- int16x8_t result;
+- __asm__ ("smlsl2 %0.8h,%2.16b,%3.16b"
++ poly16x8_t result;
++ __asm__ ("pmull2 %0.8h,%1.16b,%2.16b"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlsl_high_s16 (int32x4_t a, int16x8_t b, int16x8_t c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_s8 (int8x16_t a, int8x16_t b)
+ {
+- int32x4_t result;
+- __asm__ ("smlsl2 %0.4s,%2.8h,%3.8h"
++ int16x8_t result;
++ __asm__ ("smull2 %0.8h,%1.16b,%2.16b"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmlsl_high_s32 (int64x2_t a, int32x4_t b, int32x4_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_s16 (int16x8_t a, int16x8_t b)
+ {
+- int64x2_t result;
+- __asm__ ("smlsl2 %0.2d,%2.4s,%3.4s"
+- : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ int32x4_t result;
++ __asm__ ("smull2 %0.4s,%1.8h,%2.8h"
++ : "=w"(result)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlsl_high_u8 (uint16x8_t a, uint8x16_t b, uint8x16_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_s32 (int32x4_t a, int32x4_t b)
++{
++ int64x2_t result;
++ __asm__ ("smull2 %0.2d,%1.4s,%2.4s"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_u8 (uint8x16_t a, uint8x16_t b)
+ {
+ uint16x8_t result;
+- __asm__ ("umlsl2 %0.8h,%2.16b,%3.16b"
++ __asm__ ("umull2 %0.8h,%1.16b,%2.16b"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlsl_high_u16 (uint32x4_t a, uint16x8_t b, uint16x8_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_u16 (uint16x8_t a, uint16x8_t b)
+ {
+ uint32x4_t result;
+- __asm__ ("umlsl2 %0.4s,%2.8h,%3.8h"
++ __asm__ ("umull2 %0.4s,%1.8h,%2.8h"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmlsl_high_u32 (uint64x2_t a, uint32x4_t b, uint32x4_t c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_u32 (uint32x4_t a, uint32x4_t b)
+ {
+ uint64x2_t result;
+- __asm__ ("umlsl2 %0.2d,%2.4s,%3.4s"
++ __asm__ ("umull2 %0.2d,%1.4s,%2.4s"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-#define vmlsl_lane_s16(a, b, c, d) \
++#define vmull_lane_s16(a, b, c) \
+ __extension__ \
+ ({ \
+- int16x4_t c_ = (c); \
+ int16x4_t b_ = (b); \
+- int32x4_t a_ = (a); \
++ int16x4_t a_ = (a); \
+ int32x4_t result; \
+- __asm__ ("smlsl %0.4s, %2.4h, %3.h[%4]" \
++ __asm__ ("smull %0.4s,%1.4h,%2.h[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
++ : "w"(a_), "x"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_lane_s32(a, b, c, d) \
++#define vmull_lane_s32(a, b, c) \
+ __extension__ \
+ ({ \
+- int32x2_t c_ = (c); \
+ int32x2_t b_ = (b); \
+- int64x2_t a_ = (a); \
++ int32x2_t a_ = (a); \
+ int64x2_t result; \
+- __asm__ ("smlsl %0.2d, %2.2s, %3.s[%4]" \
++ __asm__ ("smull %0.2d,%1.2s,%2.s[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
++ : "w"(a_), "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_lane_u16(a, b, c, d) \
++#define vmull_lane_u16(a, b, c) \
+ __extension__ \
+ ({ \
+- uint16x4_t c_ = (c); \
+ uint16x4_t b_ = (b); \
+- uint32x4_t a_ = (a); \
++ uint16x4_t a_ = (a); \
+ uint32x4_t result; \
+- __asm__ ("umlsl %0.4s, %2.4h, %3.h[%4]" \
++ __asm__ ("umull %0.4s,%1.4h,%2.h[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
++ : "w"(a_), "x"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_lane_u32(a, b, c, d) \
++#define vmull_lane_u32(a, b, c) \
+ __extension__ \
+ ({ \
+- uint32x2_t c_ = (c); \
+ uint32x2_t b_ = (b); \
+- uint64x2_t a_ = (a); \
++ uint32x2_t a_ = (a); \
+ uint64x2_t result; \
+- __asm__ ("umlsl %0.2d, %2.2s, %3.s[%4]" \
++ __asm__ ("umull %0.2d, %1.2s, %2.s[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
++ : "w"(a_), "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_laneq_s16(a, b, c, d) \
++#define vmull_laneq_s16(a, b, c) \
+ __extension__ \
+ ({ \
+- int16x8_t c_ = (c); \
+- int16x4_t b_ = (b); \
+- int32x4_t a_ = (a); \
++ int16x8_t b_ = (b); \
++ int16x4_t a_ = (a); \
+ int32x4_t result; \
+- __asm__ ("smlsl %0.4s, %2.4h, %3.h[%4]" \
++ __asm__ ("smull %0.4s, %1.4h, %2.h[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
++ : "w"(a_), "x"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_laneq_s32(a, b, c, d) \
++#define vmull_laneq_s32(a, b, c) \
+ __extension__ \
+ ({ \
+- int32x4_t c_ = (c); \
+- int32x2_t b_ = (b); \
+- int64x2_t a_ = (a); \
++ int32x4_t b_ = (b); \
++ int32x2_t a_ = (a); \
+ int64x2_t result; \
+- __asm__ ("smlsl %0.2d, %2.2s, %3.s[%4]" \
++ __asm__ ("smull %0.2d, %1.2s, %2.s[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
++ : "w"(a_), "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_laneq_u16(a, b, c, d) \
++#define vmull_laneq_u16(a, b, c) \
+ __extension__ \
+ ({ \
+- uint16x8_t c_ = (c); \
+- uint16x4_t b_ = (b); \
+- uint32x4_t a_ = (a); \
++ uint16x8_t b_ = (b); \
++ uint16x4_t a_ = (a); \
+ uint32x4_t result; \
+- __asm__ ("umlsl %0.4s, %2.4h, %3.h[%4]" \
++ __asm__ ("umull %0.4s, %1.4h, %2.h[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "x"(c_), "i"(d) \
++ : "w"(a_), "x"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmlsl_laneq_u32(a, b, c, d) \
++#define vmull_laneq_u32(a, b, c) \
+ __extension__ \
+ ({ \
+- uint32x4_t c_ = (c); \
+- uint32x2_t b_ = (b); \
+- uint64x2_t a_ = (a); \
++ uint32x4_t b_ = (b); \
++ uint32x2_t a_ = (a); \
+ uint64x2_t result; \
+- __asm__ ("umlsl %0.2d, %2.2s, %3.s[%4]" \
++ __asm__ ("umull %0.2d, %1.2s, %2.s[%3]" \
+ : "=w"(result) \
+- : "0"(a_), "w"(b_), "w"(c_), "i"(d) \
++ : "w"(a_), "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlsl_n_s16 (int32x4_t a, int16x4_t b, int16_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_n_s16 (int16x4_t a, int16_t b)
+ {
+ int32x4_t result;
+- __asm__ ("smlsl %0.4s, %2.4h, %3.h[0]"
++ __asm__ ("smull %0.4s,%1.4h,%2.h[0]"
+ : "=w"(result)
+- : "0"(a), "w"(b), "x"(c)
++ : "w"(a), "x"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmlsl_n_s32 (int64x2_t a, int32x2_t b, int32_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_n_s32 (int32x2_t a, int32_t b)
+ {
+ int64x2_t result;
+- __asm__ ("smlsl %0.2d, %2.2s, %3.s[0]"
++ __asm__ ("smull %0.2d,%1.2s,%2.s[0]"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlsl_n_u16 (uint32x4_t a, uint16x4_t b, uint16_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_n_u16 (uint16x4_t a, uint16_t b)
+ {
+ uint32x4_t result;
+- __asm__ ("umlsl %0.4s, %2.4h, %3.h[0]"
++ __asm__ ("umull %0.4s,%1.4h,%2.h[0]"
+ : "=w"(result)
+- : "0"(a), "w"(b), "x"(c)
++ : "w"(a), "x"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmlsl_n_u32 (uint64x2_t a, uint32x2_t b, uint32_t c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_n_u32 (uint32x2_t a, uint32_t b)
+ {
+ uint64x2_t result;
+- __asm__ ("umlsl %0.2d, %2.2s, %3.s[0]"
++ __asm__ ("umull %0.2d,%1.2s,%2.s[0]"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlsl_s8 (int16x8_t a, int8x8_t b, int8x8_t c)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_p8 (poly8x8_t a, poly8x8_t b)
++{
++ poly16x8_t result;
++ __asm__ ("pmull %0.8h, %1.8b, %2.8b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_s8 (int8x8_t a, int8x8_t b)
+ {
+ int16x8_t result;
+- __asm__ ("smlsl %0.8h, %2.8b, %3.8b"
++ __asm__ ("smull %0.8h, %1.8b, %2.8b"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlsl_s16 (int32x4_t a, int16x4_t b, int16x4_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_s16 (int16x4_t a, int16x4_t b)
+ {
+ int32x4_t result;
+- __asm__ ("smlsl %0.4s, %2.4h, %3.4h"
++ __asm__ ("smull %0.4s, %1.4h, %2.4h"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmlsl_s32 (int64x2_t a, int32x2_t b, int32x2_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_s32 (int32x2_t a, int32x2_t b)
+ {
+ int64x2_t result;
+- __asm__ ("smlsl %0.2d, %2.2s, %3.2s"
++ __asm__ ("smull %0.2d, %1.2s, %2.2s"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlsl_u8 (uint16x8_t a, uint8x8_t b, uint8x8_t c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_u8 (uint8x8_t a, uint8x8_t b)
+ {
+ uint16x8_t result;
+- __asm__ ("umlsl %0.8h, %2.8b, %3.8b"
++ __asm__ ("umull %0.8h, %1.8b, %2.8b"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlsl_u16 (uint32x4_t a, uint16x4_t b, uint16x4_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_u16 (uint16x4_t a, uint16x4_t b)
+ {
+ uint32x4_t result;
+- __asm__ ("umlsl %0.4s, %2.4h, %3.4h"
++ __asm__ ("umull %0.4s, %1.4h, %2.4h"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmlsl_u32 (uint64x2_t a, uint32x2_t b, uint32x2_t c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_u32 (uint32x2_t a, uint32x2_t b)
+ {
+ uint64x2_t result;
+- __asm__ ("umlsl %0.2d, %2.2s, %3.2s"
++ __asm__ ("umull %0.2d, %1.2s, %2.2s"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmlsq_n_f32 (float32x4_t a, float32x4_t b, float32_t c)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadal_s8 (int16x4_t a, int8x8_t b)
+ {
+- float32x4_t result;
+- float32x4_t t1;
+- __asm__ ("fmul %1.4s, %3.4s, %4.s[0]; fsub %0.4s, %0.4s, %1.4s"
+- : "=w"(result), "=w"(t1)
+- : "0"(a), "w"(b), "w"(c)
++ int16x4_t result;
++ __asm__ ("sadalp %0.4h,%2.8b"
++ : "=w"(result)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlsq_n_s16 (int16x8_t a, int16x8_t b, int16_t c)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadal_s16 (int32x2_t a, int16x4_t b)
+ {
+- int16x8_t result;
+- __asm__ ("mls %0.8h, %2.8h, %3.h[0]"
++ int32x2_t result;
++ __asm__ ("sadalp %0.2s,%2.4h"
+ : "=w"(result)
+- : "0"(a), "w"(b), "x"(c)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlsq_n_s32 (int32x4_t a, int32x4_t b, int32_t c)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadal_s32 (int64x1_t a, int32x2_t b)
+ {
+- int32x4_t result;
+- __asm__ ("mls %0.4s, %2.4s, %3.s[0]"
++ int64x1_t result;
++ __asm__ ("sadalp %0.1d,%2.2s"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlsq_n_u16 (uint16x8_t a, uint16x8_t b, uint16_t c)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadal_u8 (uint16x4_t a, uint8x8_t b)
+ {
+- uint16x8_t result;
+- __asm__ ("mls %0.8h, %2.8h, %3.h[0]"
++ uint16x4_t result;
++ __asm__ ("uadalp %0.4h,%2.8b"
+ : "=w"(result)
+- : "0"(a), "w"(b), "x"(c)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlsq_n_u32 (uint32x4_t a, uint32x4_t b, uint32_t c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadal_u16 (uint32x2_t a, uint16x4_t b)
+ {
+- uint32x4_t result;
+- __asm__ ("mls %0.4s, %2.4s, %3.s[0]"
++ uint32x2_t result;
++ __asm__ ("uadalp %0.2s,%2.4h"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vmlsq_s8 (int8x16_t a, int8x16_t b, int8x16_t c)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadal_u32 (uint64x1_t a, uint32x2_t b)
+ {
+- int8x16_t result;
+- __asm__ ("mls %0.16b,%2.16b,%3.16b"
++ uint64x1_t result;
++ __asm__ ("uadalp %0.1d,%2.2s"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlsq_s16 (int16x8_t a, int16x8_t b, int16x8_t c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadalq_s8 (int16x8_t a, int8x16_t b)
+ {
+ int16x8_t result;
+- __asm__ ("mls %0.8h,%2.8h,%3.8h"
++ __asm__ ("sadalp %0.8h,%2.16b"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlsq_s32 (int32x4_t a, int32x4_t b, int32x4_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadalq_s16 (int32x4_t a, int16x8_t b)
+ {
+ int32x4_t result;
+- __asm__ ("mls %0.4s,%2.4s,%3.4s"
++ __asm__ ("sadalp %0.4s,%2.8h"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vmlsq_u8 (uint8x16_t a, uint8x16_t b, uint8x16_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadalq_s32 (int64x2_t a, int32x4_t b)
+ {
+- uint8x16_t result;
+- __asm__ ("mls %0.16b,%2.16b,%3.16b"
++ int64x2_t result;
++ __asm__ ("sadalp %0.2d,%2.4s"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlsq_u16 (uint16x8_t a, uint16x8_t b, uint16x8_t c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadalq_u8 (uint16x8_t a, uint8x16_t b)
+ {
+ uint16x8_t result;
+- __asm__ ("mls %0.8h,%2.8h,%3.8h"
++ __asm__ ("uadalp %0.8h,%2.16b"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlsq_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadalq_u16 (uint32x4_t a, uint16x8_t b)
+ {
+ uint32x4_t result;
+- __asm__ ("mls %0.4s,%2.4s,%3.4s"
++ __asm__ ("uadalp %0.4s,%2.8h"
+ : "=w"(result)
+- : "0"(a), "w"(b), "w"(c)
++ : "0"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmovl_high_s8 (int8x16_t a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadalq_u32 (uint64x2_t a, uint32x4_t b)
+ {
+- int16x8_t result;
+- __asm__ ("sshll2 %0.8h,%1.16b,#0"
++ uint64x2_t result;
++ __asm__ ("uadalp %0.2d,%2.4s"
++ : "=w"(result)
++ : "0"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddl_s8 (int8x8_t a)
++{
++ int16x4_t result;
++ __asm__ ("saddlp %0.4h,%1.8b"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmovl_high_s16 (int16x8_t a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddl_s16 (int16x4_t a)
+ {
+- int32x4_t result;
+- __asm__ ("sshll2 %0.4s,%1.8h,#0"
++ int32x2_t result;
++ __asm__ ("saddlp %0.2s,%1.4h"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmovl_high_s32 (int32x4_t a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddl_s32 (int32x2_t a)
+ {
+- int64x2_t result;
+- __asm__ ("sshll2 %0.2d,%1.4s,#0"
++ int64x1_t result;
++ __asm__ ("saddlp %0.1d,%1.2s"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmovl_high_u8 (uint8x16_t a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddl_u8 (uint8x8_t a)
+ {
+- uint16x8_t result;
+- __asm__ ("ushll2 %0.8h,%1.16b,#0"
++ uint16x4_t result;
++ __asm__ ("uaddlp %0.4h,%1.8b"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmovl_high_u16 (uint16x8_t a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddl_u16 (uint16x4_t a)
+ {
+- uint32x4_t result;
+- __asm__ ("ushll2 %0.4s,%1.8h,#0"
++ uint32x2_t result;
++ __asm__ ("uaddlp %0.2s,%1.4h"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmovl_high_u32 (uint32x4_t a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddl_u32 (uint32x2_t a)
+ {
+- uint64x2_t result;
+- __asm__ ("ushll2 %0.2d,%1.4s,#0"
++ uint64x1_t result;
++ __asm__ ("uaddlp %0.1d,%1.2s"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmovl_s8 (int8x8_t a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddlq_s8 (int8x16_t a)
+ {
+ int16x8_t result;
+- __asm__ ("sshll %0.8h,%1.8b,#0"
++ __asm__ ("saddlp %0.8h,%1.16b"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmovl_s16 (int16x4_t a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddlq_s16 (int16x8_t a)
+ {
+ int32x4_t result;
+- __asm__ ("sshll %0.4s,%1.4h,#0"
++ __asm__ ("saddlp %0.4s,%1.8h"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmovl_s32 (int32x2_t a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddlq_s32 (int32x4_t a)
+ {
+ int64x2_t result;
+- __asm__ ("sshll %0.2d,%1.2s,#0"
++ __asm__ ("saddlp %0.2d,%1.4s"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmovl_u8 (uint8x8_t a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddlq_u8 (uint8x16_t a)
+ {
+ uint16x8_t result;
+- __asm__ ("ushll %0.8h,%1.8b,#0"
++ __asm__ ("uaddlp %0.8h,%1.16b"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmovl_u16 (uint16x4_t a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddlq_u16 (uint16x8_t a)
+ {
+ uint32x4_t result;
+- __asm__ ("ushll %0.4s,%1.4h,#0"
++ __asm__ ("uaddlp %0.4s,%1.8h"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmovl_u32 (uint32x2_t a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddlq_u32 (uint32x4_t a)
+ {
+ uint64x2_t result;
+- __asm__ ("ushll %0.2d,%1.2s,#0"
++ __asm__ ("uaddlp %0.2d,%1.4s"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vmovn_high_s16 (int8x8_t a, int16x8_t b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_s8 (int8x16_t a, int8x16_t b)
+ {
+- int8x16_t result = vcombine_s8 (a, vcreate_s8 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("xtn2 %0.16b,%1.8h"
+- : "+w"(result)
+- : "w"(b)
++ int8x16_t result;
++ __asm__ ("addp %0.16b,%1.16b,%2.16b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmovn_high_s32 (int16x4_t a, int32x4_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_s16 (int16x8_t a, int16x8_t b)
+ {
+- int16x8_t result = vcombine_s16 (a, vcreate_s16 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("xtn2 %0.8h,%1.4s"
+- : "+w"(result)
+- : "w"(b)
++ int16x8_t result;
++ __asm__ ("addp %0.8h,%1.8h,%2.8h"
++ : "=w"(result)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmovn_high_s64 (int32x2_t a, int64x2_t b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_s32 (int32x4_t a, int32x4_t b)
+ {
+- int32x4_t result = vcombine_s32 (a, vcreate_s32 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("xtn2 %0.4s,%1.2d"
+- : "+w"(result)
+- : "w"(b)
++ int32x4_t result;
++ __asm__ ("addp %0.4s,%1.4s,%2.4s"
++ : "=w"(result)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vmovn_high_u16 (uint8x8_t a, uint16x8_t b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_s64 (int64x2_t a, int64x2_t b)
+ {
+- uint8x16_t result = vcombine_u8 (a, vcreate_u8 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("xtn2 %0.16b,%1.8h"
+- : "+w"(result)
+- : "w"(b)
++ int64x2_t result;
++ __asm__ ("addp %0.2d,%1.2d,%2.2d"
++ : "=w"(result)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmovn_high_u32 (uint16x4_t a, uint32x4_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_u8 (uint8x16_t a, uint8x16_t b)
+ {
+- uint16x8_t result = vcombine_u16 (a, vcreate_u16 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("xtn2 %0.8h,%1.4s"
+- : "+w"(result)
+- : "w"(b)
++ uint8x16_t result;
++ __asm__ ("addp %0.16b,%1.16b,%2.16b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmovn_high_u64 (uint32x2_t a, uint64x2_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_u16 (uint16x8_t a, uint16x8_t b)
+ {
+- uint32x4_t result = vcombine_u32 (a, vcreate_u32 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("xtn2 %0.4s,%1.2d"
+- : "+w"(result)
+- : "w"(b)
++ uint16x8_t result;
++ __asm__ ("addp %0.8h,%1.8h,%2.8h"
++ : "=w"(result)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vmovn_s16 (int16x8_t a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_u32 (uint32x4_t a, uint32x4_t b)
+ {
+- int8x8_t result;
+- __asm__ ("xtn %0.8b,%1.8h"
++ uint32x4_t result;
++ __asm__ ("addp %0.4s,%1.4s,%2.4s"
+ : "=w"(result)
+- : "w"(a)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmovn_s32 (int32x4_t a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_u64 (uint64x2_t a, uint64x2_t b)
+ {
+- int16x4_t result;
+- __asm__ ("xtn %0.4h,%1.4s"
++ uint64x2_t result;
++ __asm__ ("addp %0.2d,%1.2d,%2.2d"
+ : "=w"(result)
+- : "w"(a)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmovn_s64 (int64x2_t a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulh_n_s16 (int16x4_t a, int16_t b)
+ {
+- int32x2_t result;
+- __asm__ ("xtn %0.2s,%1.2d"
++ int16x4_t result;
++ __asm__ ("sqdmulh %0.4h,%1.4h,%2.h[0]"
+ : "=w"(result)
+- : "w"(a)
++ : "w"(a), "x"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vmovn_u16 (uint16x8_t a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulh_n_s32 (int32x2_t a, int32_t b)
+ {
+- uint8x8_t result;
+- __asm__ ("xtn %0.8b,%1.8h"
++ int32x2_t result;
++ __asm__ ("sqdmulh %0.2s,%1.2s,%2.s[0]"
+ : "=w"(result)
+- : "w"(a)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmovn_u32 (uint32x4_t a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhq_n_s16 (int16x8_t a, int16_t b)
+ {
+- uint16x4_t result;
+- __asm__ ("xtn %0.4h,%1.4s"
++ int16x8_t result;
++ __asm__ ("sqdmulh %0.8h,%1.8h,%2.h[0]"
+ : "=w"(result)
+- : "w"(a)
++ : "w"(a), "x"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmovn_u64 (uint64x2_t a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhq_n_s32 (int32x4_t a, int32_t b)
+ {
+- uint32x2_t result;
+- __asm__ ("xtn %0.2s,%1.2d"
++ int32x4_t result;
++ __asm__ ("sqdmulh %0.4s,%1.4s,%2.s[0]"
+ : "=w"(result)
+- : "w"(a)
++ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmul_n_f32 (float32x2_t a, float32_t b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_high_s16 (int8x8_t a, int16x8_t b)
+ {
+- float32x2_t result;
+- __asm__ ("fmul %0.2s,%1.2s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
++ int8x16_t result = vcombine_s8 (a, vcreate_s8 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("sqxtn2 %0.16b, %1.8h"
++ : "+w"(result)
++ : "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmul_n_s16 (int16x4_t a, int16_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_high_s32 (int16x4_t a, int32x4_t b)
+ {
+- int16x4_t result;
+- __asm__ ("mul %0.4h,%1.4h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
++ int16x8_t result = vcombine_s16 (a, vcreate_s16 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("sqxtn2 %0.8h, %1.4s"
++ : "+w"(result)
++ : "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmul_n_s32 (int32x2_t a, int32_t b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_high_s64 (int32x2_t a, int64x2_t b)
+ {
+- int32x2_t result;
+- __asm__ ("mul %0.2s,%1.2s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
++ int32x4_t result = vcombine_s32 (a, vcreate_s32 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("sqxtn2 %0.4s, %1.2d"
++ : "+w"(result)
++ : "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmul_n_u16 (uint16x4_t a, uint16_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_high_u16 (uint8x8_t a, uint16x8_t b)
+ {
+- uint16x4_t result;
+- __asm__ ("mul %0.4h,%1.4h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
++ uint8x16_t result = vcombine_u8 (a, vcreate_u8 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("uqxtn2 %0.16b, %1.8h"
++ : "+w"(result)
++ : "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmul_n_u32 (uint32x2_t a, uint32_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_high_u32 (uint16x4_t a, uint32x4_t b)
+ {
+- uint32x2_t result;
+- __asm__ ("mul %0.2s,%1.2s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
++ uint16x8_t result = vcombine_u16 (a, vcreate_u16 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("uqxtn2 %0.8h, %1.4s"
++ : "+w"(result)
++ : "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-#define vmull_high_lane_s16(a, b, c) \
+- __extension__ \
+- ({ \
+- int16x4_t b_ = (b); \
+- int16x8_t a_ = (a); \
+- int32x4_t result; \
+- __asm__ ("smull2 %0.4s, %1.8h, %2.h[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "x"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_high_u64 (uint32x2_t a, uint64x2_t b)
++{
++ uint32x4_t result = vcombine_u32 (a, vcreate_u32 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("uqxtn2 %0.4s, %1.2d"
++ : "+w"(result)
++ : "w"(b)
++ : /* No clobbers */);
++ return result;
++}
+
+-#define vmull_high_lane_s32(a, b, c) \
+- __extension__ \
+- ({ \
+- int32x2_t b_ = (b); \
+- int32x4_t a_ = (a); \
+- int64x2_t result; \
+- __asm__ ("smull2 %0.2d, %1.4s, %2.s[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vmull_high_lane_u16(a, b, c) \
+- __extension__ \
+- ({ \
+- uint16x4_t b_ = (b); \
+- uint16x8_t a_ = (a); \
+- uint32x4_t result; \
+- __asm__ ("umull2 %0.4s, %1.8h, %2.h[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "x"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vmull_high_lane_u32(a, b, c) \
+- __extension__ \
+- ({ \
+- uint32x2_t b_ = (b); \
+- uint32x4_t a_ = (a); \
+- uint64x2_t result; \
+- __asm__ ("umull2 %0.2d, %1.4s, %2.s[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vmull_high_laneq_s16(a, b, c) \
+- __extension__ \
+- ({ \
+- int16x8_t b_ = (b); \
+- int16x8_t a_ = (a); \
+- int32x4_t result; \
+- __asm__ ("smull2 %0.4s, %1.8h, %2.h[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "x"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vmull_high_laneq_s32(a, b, c) \
+- __extension__ \
+- ({ \
+- int32x4_t b_ = (b); \
+- int32x4_t a_ = (a); \
+- int64x2_t result; \
+- __asm__ ("smull2 %0.2d, %1.4s, %2.s[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vmull_high_laneq_u16(a, b, c) \
+- __extension__ \
+- ({ \
+- uint16x8_t b_ = (b); \
+- uint16x8_t a_ = (a); \
+- uint32x4_t result; \
+- __asm__ ("umull2 %0.4s, %1.8h, %2.h[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "x"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vmull_high_laneq_u32(a, b, c) \
+- __extension__ \
+- ({ \
+- uint32x4_t b_ = (b); \
+- uint32x4_t a_ = (a); \
+- uint64x2_t result; \
+- __asm__ ("umull2 %0.2d, %1.4s, %2.s[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmull_high_n_s16 (int16x8_t a, int16_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovun_high_s16 (uint8x8_t a, int16x8_t b)
+ {
+- int32x4_t result;
+- __asm__ ("smull2 %0.4s,%1.8h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
++ uint8x16_t result = vcombine_u8 (a, vcreate_u8 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("sqxtun2 %0.16b, %1.8h"
++ : "+w"(result)
++ : "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmull_high_n_s32 (int32x4_t a, int32_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovun_high_s32 (uint16x4_t a, int32x4_t b)
+ {
+- int64x2_t result;
+- __asm__ ("smull2 %0.2d,%1.4s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
++ uint16x8_t result = vcombine_u16 (a, vcreate_u16 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("sqxtun2 %0.8h, %1.4s"
++ : "+w"(result)
++ : "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmull_high_n_u16 (uint16x8_t a, uint16_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovun_high_s64 (uint32x2_t a, int64x2_t b)
+ {
+- uint32x4_t result;
+- __asm__ ("umull2 %0.4s,%1.8h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
++ uint32x4_t result = vcombine_u32 (a, vcreate_u32 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("sqxtun2 %0.4s, %1.2d"
++ : "+w"(result)
++ : "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmull_high_n_u32 (uint32x4_t a, uint32_t b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulh_n_s16 (int16x4_t a, int16_t b)
+ {
+- uint64x2_t result;
+- __asm__ ("umull2 %0.2d,%1.4s,%2.s[0]"
++ int16x4_t result;
++ __asm__ ("sqrdmulh %0.4h,%1.4h,%2.h[0]"
+ : "=w"(result)
+- : "w"(a), "w"(b)
++ : "w"(a), "x"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vmull_high_p8 (poly8x16_t a, poly8x16_t b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulh_n_s32 (int32x2_t a, int32_t b)
+ {
+- poly16x8_t result;
+- __asm__ ("pmull2 %0.8h,%1.16b,%2.16b"
++ int32x2_t result;
++ __asm__ ("sqrdmulh %0.2s,%1.2s,%2.s[0]"
+ : "=w"(result)
+ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmull_high_s8 (int8x16_t a, int8x16_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhq_n_s16 (int16x8_t a, int16_t b)
+ {
+ int16x8_t result;
+- __asm__ ("smull2 %0.8h,%1.16b,%2.16b"
++ __asm__ ("sqrdmulh %0.8h,%1.8h,%2.h[0]"
+ : "=w"(result)
+- : "w"(a), "w"(b)
++ : "w"(a), "x"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmull_high_s16 (int16x8_t a, int16x8_t b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhq_n_s32 (int32x4_t a, int32_t b)
+ {
+ int32x4_t result;
+- __asm__ ("smull2 %0.4s,%1.8h,%2.8h"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
+-
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmull_high_s32 (int32x4_t a, int32x4_t b)
+-{
+- int64x2_t result;
+- __asm__ ("smull2 %0.2d,%1.4s,%2.4s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
+-
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmull_high_u8 (uint8x16_t a, uint8x16_t b)
+-{
+- uint16x8_t result;
+- __asm__ ("umull2 %0.8h,%1.16b,%2.16b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
+-
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmull_high_u16 (uint16x8_t a, uint16x8_t b)
+-{
+- uint32x4_t result;
+- __asm__ ("umull2 %0.4s,%1.8h,%2.8h"
++ __asm__ ("sqrdmulh %0.4s,%1.4s,%2.s[0]"
+ : "=w"(result)
+ : "w"(a), "w"(b)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmull_high_u32 (uint32x4_t a, uint32x4_t b)
+-{
+- uint64x2_t result;
+- __asm__ ("umull2 %0.2d,%1.4s,%2.4s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vqrshrn_high_n_s16(a, b, c) \
++ __extension__ \
++ ({ \
++ int16x8_t b_ = (b); \
++ int8x8_t a_ = (a); \
++ int8x16_t result = vcombine_s8 \
++ (a_, vcreate_s8 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqrshrn2 %0.16b, %1.8h, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-#define vmull_lane_s16(a, b, c) \
++#define vqrshrn_high_n_s32(a, b, c) \
+ __extension__ \
+ ({ \
+- int16x4_t b_ = (b); \
++ int32x4_t b_ = (b); \
+ int16x4_t a_ = (a); \
+- int32x4_t result; \
+- __asm__ ("smull %0.4s,%1.4h,%2.h[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "x"(b_), "i"(c) \
++ int16x8_t result = vcombine_s16 \
++ (a_, vcreate_s16 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqrshrn2 %0.8h, %1.4s, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmull_lane_s32(a, b, c) \
++#define vqrshrn_high_n_s64(a, b, c) \
+ __extension__ \
+ ({ \
+- int32x2_t b_ = (b); \
++ int64x2_t b_ = (b); \
+ int32x2_t a_ = (a); \
+- int64x2_t result; \
+- __asm__ ("smull %0.2d,%1.2s,%2.s[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "w"(b_), "i"(c) \
++ int32x4_t result = vcombine_s32 \
++ (a_, vcreate_s32 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqrshrn2 %0.4s, %1.2d, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmull_lane_u16(a, b, c) \
++#define vqrshrn_high_n_u16(a, b, c) \
+ __extension__ \
+ ({ \
+- uint16x4_t b_ = (b); \
+- uint16x4_t a_ = (a); \
+- uint32x4_t result; \
+- __asm__ ("umull %0.4s,%1.4h,%2.h[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "x"(b_), "i"(c) \
++ uint16x8_t b_ = (b); \
++ uint8x8_t a_ = (a); \
++ uint8x16_t result = vcombine_u8 \
++ (a_, vcreate_u8 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("uqrshrn2 %0.16b, %1.8h, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmull_lane_u32(a, b, c) \
++#define vqrshrn_high_n_u32(a, b, c) \
+ __extension__ \
+ ({ \
+- uint32x2_t b_ = (b); \
+- uint32x2_t a_ = (a); \
+- uint64x2_t result; \
+- __asm__ ("umull %0.2d, %1.2s, %2.s[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "w"(b_), "i"(c) \
++ uint32x4_t b_ = (b); \
++ uint16x4_t a_ = (a); \
++ uint16x8_t result = vcombine_u16 \
++ (a_, vcreate_u16 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("uqrshrn2 %0.8h, %1.4s, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmull_laneq_s16(a, b, c) \
++#define vqrshrn_high_n_u64(a, b, c) \
+ __extension__ \
+ ({ \
+- int16x8_t b_ = (b); \
+- int16x4_t a_ = (a); \
+- int32x4_t result; \
+- __asm__ ("smull %0.4s, %1.4h, %2.h[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "x"(b_), "i"(c) \
++ uint64x2_t b_ = (b); \
++ uint32x2_t a_ = (a); \
++ uint32x4_t result = vcombine_u32 \
++ (a_, vcreate_u32 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("uqrshrn2 %0.4s, %1.2d, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmull_laneq_s32(a, b, c) \
++#define vqrshrun_high_n_s16(a, b, c) \
+ __extension__ \
+ ({ \
+- int32x4_t b_ = (b); \
+- int32x2_t a_ = (a); \
+- int64x2_t result; \
+- __asm__ ("smull %0.2d, %1.2s, %2.s[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "w"(b_), "i"(c) \
++ int16x8_t b_ = (b); \
++ uint8x8_t a_ = (a); \
++ uint8x16_t result = vcombine_u8 \
++ (a_, vcreate_u8 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqrshrun2 %0.16b, %1.8h, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmull_laneq_u16(a, b, c) \
++#define vqrshrun_high_n_s32(a, b, c) \
+ __extension__ \
+ ({ \
+- uint16x8_t b_ = (b); \
++ int32x4_t b_ = (b); \
+ uint16x4_t a_ = (a); \
+- uint32x4_t result; \
+- __asm__ ("umull %0.4s, %1.4h, %2.h[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "x"(b_), "i"(c) \
++ uint16x8_t result = vcombine_u16 \
++ (a_, vcreate_u16 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqrshrun2 %0.8h, %1.4s, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-#define vmull_laneq_u32(a, b, c) \
++#define vqrshrun_high_n_s64(a, b, c) \
+ __extension__ \
+ ({ \
+- uint32x4_t b_ = (b); \
++ int64x2_t b_ = (b); \
+ uint32x2_t a_ = (a); \
+- uint64x2_t result; \
+- __asm__ ("umull %0.2d, %1.2s, %2.s[%3]" \
+- : "=w"(result) \
+- : "w"(a_), "w"(b_), "i"(c) \
++ uint32x4_t result = vcombine_u32 \
++ (a_, vcreate_u32 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqrshrun2 %0.4s, %1.2d, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
+ : /* No clobbers */); \
+ result; \
+ })
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmull_n_s16 (int16x4_t a, int16_t b)
+-{
+- int32x4_t result;
+- __asm__ ("smull %0.4s,%1.4h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
+- : /* No clobbers */);
+- return result;
+-}
+-
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmull_n_s32 (int32x2_t a, int32_t b)
+-{
+- int64x2_t result;
+- __asm__ ("smull %0.2d,%1.2s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vqshrn_high_n_s16(a, b, c) \
++ __extension__ \
++ ({ \
++ int16x8_t b_ = (b); \
++ int8x8_t a_ = (a); \
++ int8x16_t result = vcombine_s8 \
++ (a_, vcreate_s8 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqshrn2 %0.16b, %1.8h, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmull_n_u16 (uint16x4_t a, uint16_t b)
+-{
+- uint32x4_t result;
+- __asm__ ("umull %0.4s,%1.4h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vqshrn_high_n_s32(a, b, c) \
++ __extension__ \
++ ({ \
++ int32x4_t b_ = (b); \
++ int16x4_t a_ = (a); \
++ int16x8_t result = vcombine_s16 \
++ (a_, vcreate_s16 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqshrn2 %0.8h, %1.4s, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmull_n_u32 (uint32x2_t a, uint32_t b)
+-{
+- uint64x2_t result;
+- __asm__ ("umull %0.2d,%1.2s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vqshrn_high_n_s64(a, b, c) \
++ __extension__ \
++ ({ \
++ int64x2_t b_ = (b); \
++ int32x2_t a_ = (a); \
++ int32x4_t result = vcombine_s32 \
++ (a_, vcreate_s32 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqshrn2 %0.4s, %1.2d, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vmull_p8 (poly8x8_t a, poly8x8_t b)
+-{
+- poly16x8_t result;
+- __asm__ ("pmull %0.8h, %1.8b, %2.8b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vqshrn_high_n_u16(a, b, c) \
++ __extension__ \
++ ({ \
++ uint16x8_t b_ = (b); \
++ uint8x8_t a_ = (a); \
++ uint8x16_t result = vcombine_u8 \
++ (a_, vcreate_u8 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("uqshrn2 %0.16b, %1.8h, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmull_s8 (int8x8_t a, int8x8_t b)
+-{
+- int16x8_t result;
+- __asm__ ("smull %0.8h, %1.8b, %2.8b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vqshrn_high_n_u32(a, b, c) \
++ __extension__ \
++ ({ \
++ uint32x4_t b_ = (b); \
++ uint16x4_t a_ = (a); \
++ uint16x8_t result = vcombine_u16 \
++ (a_, vcreate_u16 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("uqshrn2 %0.8h, %1.4s, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmull_s16 (int16x4_t a, int16x4_t b)
+-{
+- int32x4_t result;
+- __asm__ ("smull %0.4s, %1.4h, %2.4h"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vqshrn_high_n_u64(a, b, c) \
++ __extension__ \
++ ({ \
++ uint64x2_t b_ = (b); \
++ uint32x2_t a_ = (a); \
++ uint32x4_t result = vcombine_u32 \
++ (a_, vcreate_u32 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("uqshrn2 %0.4s, %1.2d, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmull_s32 (int32x2_t a, int32x2_t b)
+-{
+- int64x2_t result;
+- __asm__ ("smull %0.2d, %1.2s, %2.2s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vqshrun_high_n_s16(a, b, c) \
++ __extension__ \
++ ({ \
++ int16x8_t b_ = (b); \
++ uint8x8_t a_ = (a); \
++ uint8x16_t result = vcombine_u8 \
++ (a_, vcreate_u8 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqshrun2 %0.16b, %1.8h, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmull_u8 (uint8x8_t a, uint8x8_t b)
+-{
+- uint16x8_t result;
+- __asm__ ("umull %0.8h, %1.8b, %2.8b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vqshrun_high_n_s32(a, b, c) \
++ __extension__ \
++ ({ \
++ int32x4_t b_ = (b); \
++ uint16x4_t a_ = (a); \
++ uint16x8_t result = vcombine_u16 \
++ (a_, vcreate_u16 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqshrun2 %0.8h, %1.4s, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmull_u16 (uint16x4_t a, uint16x4_t b)
+-{
+- uint32x4_t result;
+- __asm__ ("umull %0.4s, %1.4h, %2.4h"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vqshrun_high_n_s64(a, b, c) \
++ __extension__ \
++ ({ \
++ int64x2_t b_ = (b); \
++ uint32x2_t a_ = (a); \
++ uint32x4_t result = vcombine_u32 \
++ (a_, vcreate_u32 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("sqshrun2 %0.4s, %1.2d, #%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmull_u32 (uint32x2_t a, uint32x2_t b)
+-{
+- uint64x2_t result;
+- __asm__ ("umull %0.2d, %1.2s, %2.2s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_high_n_s16(a, b, c) \
++ __extension__ \
++ ({ \
++ int16x8_t b_ = (b); \
++ int8x8_t a_ = (a); \
++ int8x16_t result = vcombine_s8 \
++ (a_, vcreate_s8 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("rshrn2 %0.16b,%1.8h,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmulq_n_f32 (float32x4_t a, float32_t b)
+-{
+- float32x4_t result;
+- __asm__ ("fmul %0.4s,%1.4s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_high_n_s32(a, b, c) \
++ __extension__ \
++ ({ \
++ int32x4_t b_ = (b); \
++ int16x4_t a_ = (a); \
++ int16x8_t result = vcombine_s16 \
++ (a_, vcreate_s16 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("rshrn2 %0.8h,%1.4s,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmulq_n_f64 (float64x2_t a, float64_t b)
+-{
+- float64x2_t result;
+- __asm__ ("fmul %0.2d,%1.2d,%2.d[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_high_n_s64(a, b, c) \
++ __extension__ \
++ ({ \
++ int64x2_t b_ = (b); \
++ int32x2_t a_ = (a); \
++ int32x4_t result = vcombine_s32 \
++ (a_, vcreate_s32 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("rshrn2 %0.4s,%1.2d,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmulq_n_s16 (int16x8_t a, int16_t b)
+-{
+- int16x8_t result;
+- __asm__ ("mul %0.8h,%1.8h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_high_n_u16(a, b, c) \
++ __extension__ \
++ ({ \
++ uint16x8_t b_ = (b); \
++ uint8x8_t a_ = (a); \
++ uint8x16_t result = vcombine_u8 \
++ (a_, vcreate_u8 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("rshrn2 %0.16b,%1.8h,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmulq_n_s32 (int32x4_t a, int32_t b)
+-{
+- int32x4_t result;
+- __asm__ ("mul %0.4s,%1.4s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_high_n_u32(a, b, c) \
++ __extension__ \
++ ({ \
++ uint32x4_t b_ = (b); \
++ uint16x4_t a_ = (a); \
++ uint16x8_t result = vcombine_u16 \
++ (a_, vcreate_u16 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("rshrn2 %0.8h,%1.4s,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmulq_n_u16 (uint16x8_t a, uint16_t b)
+-{
+- uint16x8_t result;
+- __asm__ ("mul %0.8h,%1.8h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_high_n_u64(a, b, c) \
++ __extension__ \
++ ({ \
++ uint64x2_t b_ = (b); \
++ uint32x2_t a_ = (a); \
++ uint32x4_t result = vcombine_u32 \
++ (a_, vcreate_u32 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("rshrn2 %0.4s,%1.2d,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmulq_n_u32 (uint32x4_t a, uint32_t b)
+-{
+- uint32x4_t result;
+- __asm__ ("mul %0.4s,%1.4s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_n_s16(a, b) \
++ __extension__ \
++ ({ \
++ int16x8_t a_ = (a); \
++ int8x8_t result; \
++ __asm__ ("rshrn %0.8b,%1.8h,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vmvn_p8 (poly8x8_t a)
+-{
+- poly8x8_t result;
+- __asm__ ("mvn %0.8b,%1.8b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_n_s32(a, b) \
++ __extension__ \
++ ({ \
++ int32x4_t a_ = (a); \
++ int16x4_t result; \
++ __asm__ ("rshrn %0.4h,%1.4s,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vmvn_s8 (int8x8_t a)
+-{
+- int8x8_t result;
+- __asm__ ("mvn %0.8b,%1.8b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_n_s64(a, b) \
++ __extension__ \
++ ({ \
++ int64x2_t a_ = (a); \
++ int32x2_t result; \
++ __asm__ ("rshrn %0.2s,%1.2d,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmvn_s16 (int16x4_t a)
+-{
+- int16x4_t result;
+- __asm__ ("mvn %0.8b,%1.8b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_n_u16(a, b) \
++ __extension__ \
++ ({ \
++ uint16x8_t a_ = (a); \
++ uint8x8_t result; \
++ __asm__ ("rshrn %0.8b,%1.8h,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmvn_s32 (int32x2_t a)
+-{
+- int32x2_t result;
+- __asm__ ("mvn %0.8b,%1.8b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
+-}
++#define vrshrn_n_u32(a, b) \
++ __extension__ \
++ ({ \
++ uint32x4_t a_ = (a); \
++ uint16x4_t result; \
++ __asm__ ("rshrn %0.4h,%1.4s,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vrshrn_n_u64(a, b) \
++ __extension__ \
++ ({ \
++ uint64x2_t a_ = (a); \
++ uint32x2_t result; \
++ __asm__ ("rshrn %0.2s,%1.2d,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vmvn_u8 (uint8x8_t a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrte_u32 (uint32x2_t a)
+ {
+- uint8x8_t result;
+- __asm__ ("mvn %0.8b,%1.8b"
++ uint32x2_t result;
++ __asm__ ("ursqrte %0.2s,%1.2s"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmvn_u16 (uint16x4_t a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrteq_u32 (uint32x4_t a)
+ {
+- uint16x4_t result;
+- __asm__ ("mvn %0.8b,%1.8b"
++ uint32x4_t result;
++ __asm__ ("ursqrte %0.4s,%1.4s"
+ : "=w"(result)
+ : "w"(a)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmvn_u32 (uint32x2_t a)
+-{
+- uint32x2_t result;
+- __asm__ ("mvn %0.8b,%1.8b"
+- : "=w"(result)
+- : "w"(a)
++#define vshrn_high_n_s16(a, b, c) \
++ __extension__ \
++ ({ \
++ int16x8_t b_ = (b); \
++ int8x8_t a_ = (a); \
++ int8x16_t result = vcombine_s8 \
++ (a_, vcreate_s8 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("shrn2 %0.16b,%1.8h,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_high_n_s32(a, b, c) \
++ __extension__ \
++ ({ \
++ int32x4_t b_ = (b); \
++ int16x4_t a_ = (a); \
++ int16x8_t result = vcombine_s16 \
++ (a_, vcreate_s16 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("shrn2 %0.8h,%1.4s,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_high_n_s64(a, b, c) \
++ __extension__ \
++ ({ \
++ int64x2_t b_ = (b); \
++ int32x2_t a_ = (a); \
++ int32x4_t result = vcombine_s32 \
++ (a_, vcreate_s32 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("shrn2 %0.4s,%1.2d,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_high_n_u16(a, b, c) \
++ __extension__ \
++ ({ \
++ uint16x8_t b_ = (b); \
++ uint8x8_t a_ = (a); \
++ uint8x16_t result = vcombine_u8 \
++ (a_, vcreate_u8 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("shrn2 %0.16b,%1.8h,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_high_n_u32(a, b, c) \
++ __extension__ \
++ ({ \
++ uint32x4_t b_ = (b); \
++ uint16x4_t a_ = (a); \
++ uint16x8_t result = vcombine_u16 \
++ (a_, vcreate_u16 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("shrn2 %0.8h,%1.4s,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_high_n_u64(a, b, c) \
++ __extension__ \
++ ({ \
++ uint64x2_t b_ = (b); \
++ uint32x2_t a_ = (a); \
++ uint32x4_t result = vcombine_u32 \
++ (a_, vcreate_u32 \
++ (__AARCH64_UINT64_C (0x0))); \
++ __asm__ ("shrn2 %0.4s,%1.2d,#%2" \
++ : "+w"(result) \
++ : "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_n_s16(a, b) \
++ __extension__ \
++ ({ \
++ int16x8_t a_ = (a); \
++ int8x8_t result; \
++ __asm__ ("shrn %0.8b,%1.8h,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_n_s32(a, b) \
++ __extension__ \
++ ({ \
++ int32x4_t a_ = (a); \
++ int16x4_t result; \
++ __asm__ ("shrn %0.4h,%1.4s,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_n_s64(a, b) \
++ __extension__ \
++ ({ \
++ int64x2_t a_ = (a); \
++ int32x2_t result; \
++ __asm__ ("shrn %0.2s,%1.2d,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_n_u16(a, b) \
++ __extension__ \
++ ({ \
++ uint16x8_t a_ = (a); \
++ uint8x8_t result; \
++ __asm__ ("shrn %0.8b,%1.8h,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_n_u32(a, b) \
++ __extension__ \
++ ({ \
++ uint32x4_t a_ = (a); \
++ uint16x4_t result; \
++ __asm__ ("shrn %0.4h,%1.4s,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vshrn_n_u64(a, b) \
++ __extension__ \
++ ({ \
++ uint64x2_t a_ = (a); \
++ uint32x2_t result; \
++ __asm__ ("shrn %0.2s,%1.2d,%2" \
++ : "=w"(result) \
++ : "w"(a_), "i"(b) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vsli_n_p8(a, b, c) \
++ __extension__ \
++ ({ \
++ poly8x8_t b_ = (b); \
++ poly8x8_t a_ = (a); \
++ poly8x8_t result; \
++ __asm__ ("sli %0.8b,%2.8b,%3" \
++ : "=w"(result) \
++ : "0"(a_), "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vsli_n_p16(a, b, c) \
++ __extension__ \
++ ({ \
++ poly16x4_t b_ = (b); \
++ poly16x4_t a_ = (a); \
++ poly16x4_t result; \
++ __asm__ ("sli %0.4h,%2.4h,%3" \
++ : "=w"(result) \
++ : "0"(a_), "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vsliq_n_p8(a, b, c) \
++ __extension__ \
++ ({ \
++ poly8x16_t b_ = (b); \
++ poly8x16_t a_ = (a); \
++ poly8x16_t result; \
++ __asm__ ("sli %0.16b,%2.16b,%3" \
++ : "=w"(result) \
++ : "0"(a_), "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vsliq_n_p16(a, b, c) \
++ __extension__ \
++ ({ \
++ poly16x8_t b_ = (b); \
++ poly16x8_t a_ = (a); \
++ poly16x8_t result; \
++ __asm__ ("sli %0.8h,%2.8h,%3" \
++ : "=w"(result) \
++ : "0"(a_), "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vsri_n_p8(a, b, c) \
++ __extension__ \
++ ({ \
++ poly8x8_t b_ = (b); \
++ poly8x8_t a_ = (a); \
++ poly8x8_t result; \
++ __asm__ ("sri %0.8b,%2.8b,%3" \
++ : "=w"(result) \
++ : "0"(a_), "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vsri_n_p16(a, b, c) \
++ __extension__ \
++ ({ \
++ poly16x4_t b_ = (b); \
++ poly16x4_t a_ = (a); \
++ poly16x4_t result; \
++ __asm__ ("sri %0.4h,%2.4h,%3" \
++ : "=w"(result) \
++ : "0"(a_), "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vsriq_n_p8(a, b, c) \
++ __extension__ \
++ ({ \
++ poly8x16_t b_ = (b); \
++ poly8x16_t a_ = (a); \
++ poly8x16_t result; \
++ __asm__ ("sri %0.16b,%2.16b,%3" \
++ : "=w"(result) \
++ : "0"(a_), "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++#define vsriq_n_p16(a, b, c) \
++ __extension__ \
++ ({ \
++ poly16x8_t b_ = (b); \
++ poly16x8_t a_ = (a); \
++ poly16x8_t result; \
++ __asm__ ("sri %0.8h,%2.8h,%3" \
++ : "=w"(result) \
++ : "0"(a_), "w"(b_), "i"(c) \
++ : /* No clobbers */); \
++ result; \
++ })
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtst_p8 (poly8x8_t a, poly8x8_t b)
++{
++ uint8x8_t result;
++ __asm__ ("cmtst %0.8b, %1.8b, %2.8b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtst_p16 (poly16x4_t a, poly16x4_t b)
++{
++ uint16x4_t result;
++ __asm__ ("cmtst %0.4h, %1.4h, %2.4h"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstq_p8 (poly8x16_t a, poly8x16_t b)
++{
++ uint8x16_t result;
++ __asm__ ("cmtst %0.16b, %1.16b, %2.16b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstq_p16 (poly16x8_t a, poly16x8_t b)
++{
++ uint16x8_t result;
++ __asm__ ("cmtst %0.8h, %1.8h, %2.8h"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++/* End of temporary inline asm implementations. */
++
++/* Start of temporary inline asm for vldn, vstn and friends. */
++
++/* Create struct element types for duplicating loads.
++
++ Create 2 element structures of:
++
++ +------+----+----+----+----+
++ | | 8 | 16 | 32 | 64 |
++ +------+----+----+----+----+
++ |int | Y | Y | N | N |
++ +------+----+----+----+----+
++ |uint | Y | Y | N | N |
++ +------+----+----+----+----+
++ |float | - | Y | N | N |
++ +------+----+----+----+----+
++ |poly | Y | Y | - | - |
++ +------+----+----+----+----+
++
++ Create 3 element structures of:
++
++ +------+----+----+----+----+
++ | | 8 | 16 | 32 | 64 |
++ +------+----+----+----+----+
++ |int | Y | Y | Y | Y |
++ +------+----+----+----+----+
++ |uint | Y | Y | Y | Y |
++ +------+----+----+----+----+
++ |float | - | Y | Y | Y |
++ +------+----+----+----+----+
++ |poly | Y | Y | - | - |
++ +------+----+----+----+----+
++
++ Create 4 element structures of:
++
++ +------+----+----+----+----+
++ | | 8 | 16 | 32 | 64 |
++ +------+----+----+----+----+
++ |int | Y | N | N | Y |
++ +------+----+----+----+----+
++ |uint | Y | N | N | Y |
++ +------+----+----+----+----+
++ |float | - | N | N | Y |
++ +------+----+----+----+----+
++ |poly | Y | N | - | - |
++ +------+----+----+----+----+
++
++ This is required for casting memory reference. */
++#define __STRUCTN(t, sz, nelem) \
++ typedef struct t ## sz ## x ## nelem ## _t { \
++ t ## sz ## _t val[nelem]; \
++ } t ## sz ## x ## nelem ## _t;
++
++/* 2-element structs. */
++__STRUCTN (int, 8, 2)
++__STRUCTN (int, 16, 2)
++__STRUCTN (uint, 8, 2)
++__STRUCTN (uint, 16, 2)
++__STRUCTN (float, 16, 2)
++__STRUCTN (poly, 8, 2)
++__STRUCTN (poly, 16, 2)
++/* 3-element structs. */
++__STRUCTN (int, 8, 3)
++__STRUCTN (int, 16, 3)
++__STRUCTN (int, 32, 3)
++__STRUCTN (int, 64, 3)
++__STRUCTN (uint, 8, 3)
++__STRUCTN (uint, 16, 3)
++__STRUCTN (uint, 32, 3)
++__STRUCTN (uint, 64, 3)
++__STRUCTN (float, 16, 3)
++__STRUCTN (float, 32, 3)
++__STRUCTN (float, 64, 3)
++__STRUCTN (poly, 8, 3)
++__STRUCTN (poly, 16, 3)
++/* 4-element structs. */
++__STRUCTN (int, 8, 4)
++__STRUCTN (int, 64, 4)
++__STRUCTN (uint, 8, 4)
++__STRUCTN (uint, 64, 4)
++__STRUCTN (poly, 8, 4)
++__STRUCTN (float, 64, 4)
++#undef __STRUCTN
++
++
++#define __ST2_LANE_FUNC(intype, largetype, ptrtype, mode, \
++ qmode, ptr_mode, funcsuffix, signedtype) \
++__extension__ extern __inline void \
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
++vst2_lane_ ## funcsuffix (ptrtype *__ptr, \
++ intype __b, const int __c) \
++{ \
++ __builtin_aarch64_simd_oi __o; \
++ largetype __temp; \
++ __temp.val[0] \
++ = vcombine_##funcsuffix (__b.val[0], \
++ vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
++ __temp.val[1] \
++ = vcombine_##funcsuffix (__b.val[1], \
++ vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
++ __o = __builtin_aarch64_set_qregoi##qmode (__o, \
++ (signedtype) __temp.val[0], 0); \
++ __o = __builtin_aarch64_set_qregoi##qmode (__o, \
++ (signedtype) __temp.val[1], 1); \
++ __builtin_aarch64_st2_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
++ __ptr, __o, __c); \
++}
++
++__ST2_LANE_FUNC (float16x4x2_t, float16x8x2_t, float16_t, v4hf, v8hf, hf, f16,
++ float16x8_t)
++__ST2_LANE_FUNC (float32x2x2_t, float32x4x2_t, float32_t, v2sf, v4sf, sf, f32,
++ float32x4_t)
++__ST2_LANE_FUNC (float64x1x2_t, float64x2x2_t, float64_t, df, v2df, df, f64,
++ float64x2_t)
++__ST2_LANE_FUNC (poly8x8x2_t, poly8x16x2_t, poly8_t, v8qi, v16qi, qi, p8,
++ int8x16_t)
++__ST2_LANE_FUNC (poly16x4x2_t, poly16x8x2_t, poly16_t, v4hi, v8hi, hi, p16,
++ int16x8_t)
++__ST2_LANE_FUNC (int8x8x2_t, int8x16x2_t, int8_t, v8qi, v16qi, qi, s8,
++ int8x16_t)
++__ST2_LANE_FUNC (int16x4x2_t, int16x8x2_t, int16_t, v4hi, v8hi, hi, s16,
++ int16x8_t)
++__ST2_LANE_FUNC (int32x2x2_t, int32x4x2_t, int32_t, v2si, v4si, si, s32,
++ int32x4_t)
++__ST2_LANE_FUNC (int64x1x2_t, int64x2x2_t, int64_t, di, v2di, di, s64,
++ int64x2_t)
++__ST2_LANE_FUNC (uint8x8x2_t, uint8x16x2_t, uint8_t, v8qi, v16qi, qi, u8,
++ int8x16_t)
++__ST2_LANE_FUNC (uint16x4x2_t, uint16x8x2_t, uint16_t, v4hi, v8hi, hi, u16,
++ int16x8_t)
++__ST2_LANE_FUNC (uint32x2x2_t, uint32x4x2_t, uint32_t, v2si, v4si, si, u32,
++ int32x4_t)
++__ST2_LANE_FUNC (uint64x1x2_t, uint64x2x2_t, uint64_t, di, v2di, di, u64,
++ int64x2_t)
++
++#undef __ST2_LANE_FUNC
++#define __ST2_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \
++__extension__ extern __inline void \
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
++vst2q_lane_ ## funcsuffix (ptrtype *__ptr, \
++ intype __b, const int __c) \
++{ \
++ union { intype __i; \
++ __builtin_aarch64_simd_oi __o; } __temp = { __b }; \
++ __builtin_aarch64_st2_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
++ __ptr, __temp.__o, __c); \
++}
++
++__ST2_LANE_FUNC (float16x8x2_t, float16_t, v8hf, hf, f16)
++__ST2_LANE_FUNC (float32x4x2_t, float32_t, v4sf, sf, f32)
++__ST2_LANE_FUNC (float64x2x2_t, float64_t, v2df, df, f64)
++__ST2_LANE_FUNC (poly8x16x2_t, poly8_t, v16qi, qi, p8)
++__ST2_LANE_FUNC (poly16x8x2_t, poly16_t, v8hi, hi, p16)
++__ST2_LANE_FUNC (int8x16x2_t, int8_t, v16qi, qi, s8)
++__ST2_LANE_FUNC (int16x8x2_t, int16_t, v8hi, hi, s16)
++__ST2_LANE_FUNC (int32x4x2_t, int32_t, v4si, si, s32)
++__ST2_LANE_FUNC (int64x2x2_t, int64_t, v2di, di, s64)
++__ST2_LANE_FUNC (uint8x16x2_t, uint8_t, v16qi, qi, u8)
++__ST2_LANE_FUNC (uint16x8x2_t, uint16_t, v8hi, hi, u16)
++__ST2_LANE_FUNC (uint32x4x2_t, uint32_t, v4si, si, u32)
++__ST2_LANE_FUNC (uint64x2x2_t, uint64_t, v2di, di, u64)
++
++#define __ST3_LANE_FUNC(intype, largetype, ptrtype, mode, \
++ qmode, ptr_mode, funcsuffix, signedtype) \
++__extension__ extern __inline void \
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
++vst3_lane_ ## funcsuffix (ptrtype *__ptr, \
++ intype __b, const int __c) \
++{ \
++ __builtin_aarch64_simd_ci __o; \
++ largetype __temp; \
++ __temp.val[0] \
++ = vcombine_##funcsuffix (__b.val[0], \
++ vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
++ __temp.val[1] \
++ = vcombine_##funcsuffix (__b.val[1], \
++ vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
++ __temp.val[2] \
++ = vcombine_##funcsuffix (__b.val[2], \
++ vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
++ __o = __builtin_aarch64_set_qregci##qmode (__o, \
++ (signedtype) __temp.val[0], 0); \
++ __o = __builtin_aarch64_set_qregci##qmode (__o, \
++ (signedtype) __temp.val[1], 1); \
++ __o = __builtin_aarch64_set_qregci##qmode (__o, \
++ (signedtype) __temp.val[2], 2); \
++ __builtin_aarch64_st3_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
++ __ptr, __o, __c); \
++}
++
++__ST3_LANE_FUNC (float16x4x3_t, float16x8x3_t, float16_t, v4hf, v8hf, hf, f16,
++ float16x8_t)
++__ST3_LANE_FUNC (float32x2x3_t, float32x4x3_t, float32_t, v2sf, v4sf, sf, f32,
++ float32x4_t)
++__ST3_LANE_FUNC (float64x1x3_t, float64x2x3_t, float64_t, df, v2df, df, f64,
++ float64x2_t)
++__ST3_LANE_FUNC (poly8x8x3_t, poly8x16x3_t, poly8_t, v8qi, v16qi, qi, p8,
++ int8x16_t)
++__ST3_LANE_FUNC (poly16x4x3_t, poly16x8x3_t, poly16_t, v4hi, v8hi, hi, p16,
++ int16x8_t)
++__ST3_LANE_FUNC (int8x8x3_t, int8x16x3_t, int8_t, v8qi, v16qi, qi, s8,
++ int8x16_t)
++__ST3_LANE_FUNC (int16x4x3_t, int16x8x3_t, int16_t, v4hi, v8hi, hi, s16,
++ int16x8_t)
++__ST3_LANE_FUNC (int32x2x3_t, int32x4x3_t, int32_t, v2si, v4si, si, s32,
++ int32x4_t)
++__ST3_LANE_FUNC (int64x1x3_t, int64x2x3_t, int64_t, di, v2di, di, s64,
++ int64x2_t)
++__ST3_LANE_FUNC (uint8x8x3_t, uint8x16x3_t, uint8_t, v8qi, v16qi, qi, u8,
++ int8x16_t)
++__ST3_LANE_FUNC (uint16x4x3_t, uint16x8x3_t, uint16_t, v4hi, v8hi, hi, u16,
++ int16x8_t)
++__ST3_LANE_FUNC (uint32x2x3_t, uint32x4x3_t, uint32_t, v2si, v4si, si, u32,
++ int32x4_t)
++__ST3_LANE_FUNC (uint64x1x3_t, uint64x2x3_t, uint64_t, di, v2di, di, u64,
++ int64x2_t)
++
++#undef __ST3_LANE_FUNC
++#define __ST3_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \
++__extension__ extern __inline void \
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
++vst3q_lane_ ## funcsuffix (ptrtype *__ptr, \
++ intype __b, const int __c) \
++{ \
++ union { intype __i; \
++ __builtin_aarch64_simd_ci __o; } __temp = { __b }; \
++ __builtin_aarch64_st3_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
++ __ptr, __temp.__o, __c); \
++}
++
++__ST3_LANE_FUNC (float16x8x3_t, float16_t, v8hf, hf, f16)
++__ST3_LANE_FUNC (float32x4x3_t, float32_t, v4sf, sf, f32)
++__ST3_LANE_FUNC (float64x2x3_t, float64_t, v2df, df, f64)
++__ST3_LANE_FUNC (poly8x16x3_t, poly8_t, v16qi, qi, p8)
++__ST3_LANE_FUNC (poly16x8x3_t, poly16_t, v8hi, hi, p16)
++__ST3_LANE_FUNC (int8x16x3_t, int8_t, v16qi, qi, s8)
++__ST3_LANE_FUNC (int16x8x3_t, int16_t, v8hi, hi, s16)
++__ST3_LANE_FUNC (int32x4x3_t, int32_t, v4si, si, s32)
++__ST3_LANE_FUNC (int64x2x3_t, int64_t, v2di, di, s64)
++__ST3_LANE_FUNC (uint8x16x3_t, uint8_t, v16qi, qi, u8)
++__ST3_LANE_FUNC (uint16x8x3_t, uint16_t, v8hi, hi, u16)
++__ST3_LANE_FUNC (uint32x4x3_t, uint32_t, v4si, si, u32)
++__ST3_LANE_FUNC (uint64x2x3_t, uint64_t, v2di, di, u64)
++
++#define __ST4_LANE_FUNC(intype, largetype, ptrtype, mode, \
++ qmode, ptr_mode, funcsuffix, signedtype) \
++__extension__ extern __inline void \
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
++vst4_lane_ ## funcsuffix (ptrtype *__ptr, \
++ intype __b, const int __c) \
++{ \
++ __builtin_aarch64_simd_xi __o; \
++ largetype __temp; \
++ __temp.val[0] \
++ = vcombine_##funcsuffix (__b.val[0], \
++ vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
++ __temp.val[1] \
++ = vcombine_##funcsuffix (__b.val[1], \
++ vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
++ __temp.val[2] \
++ = vcombine_##funcsuffix (__b.val[2], \
++ vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
++ __temp.val[3] \
++ = vcombine_##funcsuffix (__b.val[3], \
++ vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
++ __o = __builtin_aarch64_set_qregxi##qmode (__o, \
++ (signedtype) __temp.val[0], 0); \
++ __o = __builtin_aarch64_set_qregxi##qmode (__o, \
++ (signedtype) __temp.val[1], 1); \
++ __o = __builtin_aarch64_set_qregxi##qmode (__o, \
++ (signedtype) __temp.val[2], 2); \
++ __o = __builtin_aarch64_set_qregxi##qmode (__o, \
++ (signedtype) __temp.val[3], 3); \
++ __builtin_aarch64_st4_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
++ __ptr, __o, __c); \
++}
++
++__ST4_LANE_FUNC (float16x4x4_t, float16x8x4_t, float16_t, v4hf, v8hf, hf, f16,
++ float16x8_t)
++__ST4_LANE_FUNC (float32x2x4_t, float32x4x4_t, float32_t, v2sf, v4sf, sf, f32,
++ float32x4_t)
++__ST4_LANE_FUNC (float64x1x4_t, float64x2x4_t, float64_t, df, v2df, df, f64,
++ float64x2_t)
++__ST4_LANE_FUNC (poly8x8x4_t, poly8x16x4_t, poly8_t, v8qi, v16qi, qi, p8,
++ int8x16_t)
++__ST4_LANE_FUNC (poly16x4x4_t, poly16x8x4_t, poly16_t, v4hi, v8hi, hi, p16,
++ int16x8_t)
++__ST4_LANE_FUNC (int8x8x4_t, int8x16x4_t, int8_t, v8qi, v16qi, qi, s8,
++ int8x16_t)
++__ST4_LANE_FUNC (int16x4x4_t, int16x8x4_t, int16_t, v4hi, v8hi, hi, s16,
++ int16x8_t)
++__ST4_LANE_FUNC (int32x2x4_t, int32x4x4_t, int32_t, v2si, v4si, si, s32,
++ int32x4_t)
++__ST4_LANE_FUNC (int64x1x4_t, int64x2x4_t, int64_t, di, v2di, di, s64,
++ int64x2_t)
++__ST4_LANE_FUNC (uint8x8x4_t, uint8x16x4_t, uint8_t, v8qi, v16qi, qi, u8,
++ int8x16_t)
++__ST4_LANE_FUNC (uint16x4x4_t, uint16x8x4_t, uint16_t, v4hi, v8hi, hi, u16,
++ int16x8_t)
++__ST4_LANE_FUNC (uint32x2x4_t, uint32x4x4_t, uint32_t, v2si, v4si, si, u32,
++ int32x4_t)
++__ST4_LANE_FUNC (uint64x1x4_t, uint64x2x4_t, uint64_t, di, v2di, di, u64,
++ int64x2_t)
++
++#undef __ST4_LANE_FUNC
++#define __ST4_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \
++__extension__ extern __inline void \
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
++vst4q_lane_ ## funcsuffix (ptrtype *__ptr, \
++ intype __b, const int __c) \
++{ \
++ union { intype __i; \
++ __builtin_aarch64_simd_xi __o; } __temp = { __b }; \
++ __builtin_aarch64_st4_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
++ __ptr, __temp.__o, __c); \
++}
++
++__ST4_LANE_FUNC (float16x8x4_t, float16_t, v8hf, hf, f16)
++__ST4_LANE_FUNC (float32x4x4_t, float32_t, v4sf, sf, f32)
++__ST4_LANE_FUNC (float64x2x4_t, float64_t, v2df, df, f64)
++__ST4_LANE_FUNC (poly8x16x4_t, poly8_t, v16qi, qi, p8)
++__ST4_LANE_FUNC (poly16x8x4_t, poly16_t, v8hi, hi, p16)
++__ST4_LANE_FUNC (int8x16x4_t, int8_t, v16qi, qi, s8)
++__ST4_LANE_FUNC (int16x8x4_t, int16_t, v8hi, hi, s16)
++__ST4_LANE_FUNC (int32x4x4_t, int32_t, v4si, si, s32)
++__ST4_LANE_FUNC (int64x2x4_t, int64_t, v2di, di, s64)
++__ST4_LANE_FUNC (uint8x16x4_t, uint8_t, v16qi, qi, u8)
++__ST4_LANE_FUNC (uint16x8x4_t, uint16_t, v8hi, hi, u16)
++__ST4_LANE_FUNC (uint32x4x4_t, uint32_t, v4si, si, u32)
++__ST4_LANE_FUNC (uint64x2x4_t, uint64_t, v2di, di, u64)
++
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddlv_s32 (int32x2_t a)
++{
++ int64_t result;
++ __asm__ ("saddlp %0.1d, %1.2s" : "=w"(result) : "w"(a) : );
++ return result;
++}
++
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddlv_u32 (uint32x2_t a)
++{
++ uint64_t result;
++ __asm__ ("uaddlp %0.1d, %1.2s" : "=w"(result) : "w"(a) : );
++ return result;
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulh_laneq_s16 (int16x4_t __a, int16x8_t __b, const int __c)
++{
++ return __builtin_aarch64_sqdmulh_laneqv4hi (__a, __b, __c);
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulh_laneq_s32 (int32x2_t __a, int32x4_t __b, const int __c)
++{
++ return __builtin_aarch64_sqdmulh_laneqv2si (__a, __b, __c);
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhq_laneq_s16 (int16x8_t __a, int16x8_t __b, const int __c)
++{
++ return __builtin_aarch64_sqdmulh_laneqv8hi (__a, __b, __c);
++}
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __c)
++{
++ return __builtin_aarch64_sqdmulh_laneqv4si (__a, __b, __c);
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulh_laneq_s16 (int16x4_t __a, int16x8_t __b, const int __c)
++{
++ return __builtin_aarch64_sqrdmulh_laneqv4hi (__a, __b, __c);
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulh_laneq_s32 (int32x2_t __a, int32x4_t __b, const int __c)
++{
++ return __builtin_aarch64_sqrdmulh_laneqv2si (__a, __b, __c);
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhq_laneq_s16 (int16x8_t __a, int16x8_t __b, const int __c)
++{
++ return __builtin_aarch64_sqrdmulh_laneqv8hi (__a, __b, __c);
++}
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __c)
++{
++ return __builtin_aarch64_sqrdmulh_laneqv4si (__a, __b, __c);
++}
++
++/* Table intrinsics. */
++
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl1_p8 (poly8x16_t a, uint8x8_t b)
++{
++ poly8x8_t result;
++ __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl1_s8 (int8x16_t a, uint8x8_t b)
++{
++ int8x8_t result;
++ __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl1_u8 (uint8x16_t a, uint8x8_t b)
++{
++ uint8x8_t result;
++ __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl1q_p8 (poly8x16_t a, uint8x16_t b)
++{
++ poly8x16_t result;
++ __asm__ ("tbl %0.16b, {%1.16b}, %2.16b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl1q_s8 (int8x16_t a, uint8x16_t b)
++{
++ int8x16_t result;
++ __asm__ ("tbl %0.16b, {%1.16b}, %2.16b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl1q_u8 (uint8x16_t a, uint8x16_t b)
++{
++ uint8x16_t result;
++ __asm__ ("tbl %0.16b, {%1.16b}, %2.16b"
++ : "=w"(result)
++ : "w"(a), "w"(b)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx1_s8 (int8x8_t r, int8x16_t tab, uint8x8_t idx)
++{
++ int8x8_t result = r;
++ __asm__ ("tbx %0.8b,{%1.16b},%2.8b"
++ : "+w"(result)
++ : "w"(tab), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx1_u8 (uint8x8_t r, uint8x16_t tab, uint8x8_t idx)
++{
++ uint8x8_t result = r;
++ __asm__ ("tbx %0.8b,{%1.16b},%2.8b"
++ : "+w"(result)
++ : "w"(tab), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx1_p8 (poly8x8_t r, poly8x16_t tab, uint8x8_t idx)
++{
++ poly8x8_t result = r;
++ __asm__ ("tbx %0.8b,{%1.16b},%2.8b"
++ : "+w"(result)
++ : "w"(tab), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx1q_s8 (int8x16_t r, int8x16_t tab, uint8x16_t idx)
++{
++ int8x16_t result = r;
++ __asm__ ("tbx %0.16b,{%1.16b},%2.16b"
++ : "+w"(result)
++ : "w"(tab), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx1q_u8 (uint8x16_t r, uint8x16_t tab, uint8x16_t idx)
++{
++ uint8x16_t result = r;
++ __asm__ ("tbx %0.16b,{%1.16b},%2.16b"
++ : "+w"(result)
++ : "w"(tab), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx1q_p8 (poly8x16_t r, poly8x16_t tab, uint8x16_t idx)
++{
++ poly8x16_t result = r;
++ __asm__ ("tbx %0.16b,{%1.16b},%2.16b"
++ : "+w"(result)
++ : "w"(tab), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++/* V7 legacy table intrinsics. */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl1_s8 (int8x8_t tab, int8x8_t idx)
++{
++ int8x8_t result;
++ int8x16_t temp = vcombine_s8 (tab, vcreate_s8 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
++ : "=w"(result)
++ : "w"(temp), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl1_u8 (uint8x8_t tab, uint8x8_t idx)
++{
++ uint8x8_t result;
++ uint8x16_t temp = vcombine_u8 (tab, vcreate_u8 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
++ : "=w"(result)
++ : "w"(temp), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl1_p8 (poly8x8_t tab, uint8x8_t idx)
++{
++ poly8x8_t result;
++ poly8x16_t temp = vcombine_p8 (tab, vcreate_p8 (__AARCH64_UINT64_C (0x0)));
++ __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
++ : "=w"(result)
++ : "w"(temp), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl2_s8 (int8x8x2_t tab, int8x8_t idx)
++{
++ int8x8_t result;
++ int8x16_t temp = vcombine_s8 (tab.val[0], tab.val[1]);
++ __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
++ : "=w"(result)
++ : "w"(temp), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl2_u8 (uint8x8x2_t tab, uint8x8_t idx)
++{
++ uint8x8_t result;
++ uint8x16_t temp = vcombine_u8 (tab.val[0], tab.val[1]);
++ __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
++ : "=w"(result)
++ : "w"(temp), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl2_p8 (poly8x8x2_t tab, uint8x8_t idx)
++{
++ poly8x8_t result;
++ poly8x16_t temp = vcombine_p8 (tab.val[0], tab.val[1]);
++ __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
++ : "=w"(result)
++ : "w"(temp), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl3_s8 (int8x8x3_t tab, int8x8_t idx)
++{
++ int8x8_t result;
++ int8x16x2_t temp;
++ __builtin_aarch64_simd_oi __o;
++ temp.val[0] = vcombine_s8 (tab.val[0], tab.val[1]);
++ temp.val[1] = vcombine_s8 (tab.val[2], vcreate_s8 (__AARCH64_UINT64_C (0x0)));
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[1], 1);
++ result = __builtin_aarch64_tbl3v8qi (__o, idx);
++ return result;
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl3_u8 (uint8x8x3_t tab, uint8x8_t idx)
++{
++ uint8x8_t result;
++ uint8x16x2_t temp;
++ __builtin_aarch64_simd_oi __o;
++ temp.val[0] = vcombine_u8 (tab.val[0], tab.val[1]);
++ temp.val[1] = vcombine_u8 (tab.val[2], vcreate_u8 (__AARCH64_UINT64_C (0x0)));
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[1], 1);
++ result = (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
++ return result;
++}
++
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl3_p8 (poly8x8x3_t tab, uint8x8_t idx)
++{
++ poly8x8_t result;
++ poly8x16x2_t temp;
++ __builtin_aarch64_simd_oi __o;
++ temp.val[0] = vcombine_p8 (tab.val[0], tab.val[1]);
++ temp.val[1] = vcombine_p8 (tab.val[2], vcreate_p8 (__AARCH64_UINT64_C (0x0)));
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[1], 1);
++ result = (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
++ return result;
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl4_s8 (int8x8x4_t tab, int8x8_t idx)
++{
++ int8x8_t result;
++ int8x16x2_t temp;
++ __builtin_aarch64_simd_oi __o;
++ temp.val[0] = vcombine_s8 (tab.val[0], tab.val[1]);
++ temp.val[1] = vcombine_s8 (tab.val[2], tab.val[3]);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[1], 1);
++ result = __builtin_aarch64_tbl3v8qi (__o, idx);
++ return result;
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl4_u8 (uint8x8x4_t tab, uint8x8_t idx)
++{
++ uint8x8_t result;
++ uint8x16x2_t temp;
++ __builtin_aarch64_simd_oi __o;
++ temp.val[0] = vcombine_u8 (tab.val[0], tab.val[1]);
++ temp.val[1] = vcombine_u8 (tab.val[2], tab.val[3]);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[1], 1);
++ result = (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
++ return result;
++}
++
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbl4_p8 (poly8x8x4_t tab, uint8x8_t idx)
++{
++ poly8x8_t result;
++ poly8x16x2_t temp;
++ __builtin_aarch64_simd_oi __o;
++ temp.val[0] = vcombine_p8 (tab.val[0], tab.val[1]);
++ temp.val[1] = vcombine_p8 (tab.val[2], tab.val[3]);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[1], 1);
++ result = (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
++ return result;
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx2_s8 (int8x8_t r, int8x8x2_t tab, int8x8_t idx)
++{
++ int8x8_t result = r;
++ int8x16_t temp = vcombine_s8 (tab.val[0], tab.val[1]);
++ __asm__ ("tbx %0.8b, {%1.16b}, %2.8b"
++ : "+w"(result)
++ : "w"(temp), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx2_u8 (uint8x8_t r, uint8x8x2_t tab, uint8x8_t idx)
++{
++ uint8x8_t result = r;
++ uint8x16_t temp = vcombine_u8 (tab.val[0], tab.val[1]);
++ __asm__ ("tbx %0.8b, {%1.16b}, %2.8b"
++ : "+w"(result)
++ : "w"(temp), "w"(idx)
++ : /* No clobbers */);
++ return result;
++}
++
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx2_p8 (poly8x8_t r, poly8x8x2_t tab, uint8x8_t idx)
++{
++ poly8x8_t result = r;
++ poly8x16_t temp = vcombine_p8 (tab.val[0], tab.val[1]);
++ __asm__ ("tbx %0.8b, {%1.16b}, %2.8b"
++ : "+w"(result)
++ : "w"(temp), "w"(idx)
+ : /* No clobbers */);
+ return result;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vmvnq_p8 (poly8x16_t a)
++/* End of temporary inline asm. */
++
++/* Start of optimal implementations in approved order. */
++
++/* vabd. */
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabds_f32 (float32_t __a, float32_t __b)
++{
++ return __builtin_aarch64_fabdsf (__a, __b);
++}
++
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabdd_f64 (float64_t __a, float64_t __b)
++{
++ return __builtin_aarch64_fabddf (__a, __b);
++}
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabd_f32 (float32x2_t __a, float32x2_t __b)
++{
++ return __builtin_aarch64_fabdv2sf (__a, __b);
++}
++
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabd_f64 (float64x1_t __a, float64x1_t __b)
++{
++ return (float64x1_t) {vabdd_f64 (vget_lane_f64 (__a, 0),
++ vget_lane_f64 (__b, 0))};
++}
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabdq_f32 (float32x4_t __a, float32x4_t __b)
++{
++ return __builtin_aarch64_fabdv4sf (__a, __b);
++}
++
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabdq_f64 (float64x2_t __a, float64x2_t __b)
++{
++ return __builtin_aarch64_fabdv2df (__a, __b);
++}
++
++/* vabs */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabs_f32 (float32x2_t __a)
++{
++ return __builtin_aarch64_absv2sf (__a);
++}
++
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabs_f64 (float64x1_t __a)
++{
++ return (float64x1_t) {__builtin_fabs (__a[0])};
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabs_s8 (int8x8_t __a)
++{
++ return __builtin_aarch64_absv8qi (__a);
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabs_s16 (int16x4_t __a)
++{
++ return __builtin_aarch64_absv4hi (__a);
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabs_s32 (int32x2_t __a)
++{
++ return __builtin_aarch64_absv2si (__a);
++}
++
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabs_s64 (int64x1_t __a)
++{
++ return (int64x1_t) {__builtin_aarch64_absdi (__a[0])};
++}
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabsq_f32 (float32x4_t __a)
++{
++ return __builtin_aarch64_absv4sf (__a);
++}
++
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabsq_f64 (float64x2_t __a)
++{
++ return __builtin_aarch64_absv2df (__a);
++}
++
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabsq_s8 (int8x16_t __a)
++{
++ return __builtin_aarch64_absv16qi (__a);
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabsq_s16 (int16x8_t __a)
++{
++ return __builtin_aarch64_absv8hi (__a);
++}
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabsq_s32 (int32x4_t __a)
++{
++ return __builtin_aarch64_absv4si (__a);
++}
++
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabsq_s64 (int64x2_t __a)
++{
++ return __builtin_aarch64_absv2di (__a);
++}
++
++/* vadd */
++
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddd_s64 (int64_t __a, int64_t __b)
++{
++ return __a + __b;
++}
++
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddd_u64 (uint64_t __a, uint64_t __b)
++{
++ return __a + __b;
++}
++
++/* vaddv */
++
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddv_s8 (int8x8_t __a)
++{
++ return __builtin_aarch64_reduc_plus_scal_v8qi (__a);
++}
++
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddv_s16 (int16x4_t __a)
++{
++ return __builtin_aarch64_reduc_plus_scal_v4hi (__a);
++}
++
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddv_s32 (int32x2_t __a)
++{
++ return __builtin_aarch64_reduc_plus_scal_v2si (__a);
++}
++
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddv_u8 (uint8x8_t __a)
++{
++ return (uint8_t) __builtin_aarch64_reduc_plus_scal_v8qi ((int8x8_t) __a);
++}
++
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddv_u16 (uint16x4_t __a)
++{
++ return (uint16_t) __builtin_aarch64_reduc_plus_scal_v4hi ((int16x4_t) __a);
++}
++
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddv_u32 (uint32x2_t __a)
++{
++ return (int32_t) __builtin_aarch64_reduc_plus_scal_v2si ((int32x2_t) __a);
++}
++
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddvq_s8 (int8x16_t __a)
++{
++ return __builtin_aarch64_reduc_plus_scal_v16qi (__a);
++}
++
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddvq_s16 (int16x8_t __a)
++{
++ return __builtin_aarch64_reduc_plus_scal_v8hi (__a);
++}
++
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddvq_s32 (int32x4_t __a)
++{
++ return __builtin_aarch64_reduc_plus_scal_v4si (__a);
++}
++
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddvq_s64 (int64x2_t __a)
++{
++ return __builtin_aarch64_reduc_plus_scal_v2di (__a);
++}
++
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddvq_u8 (uint8x16_t __a)
++{
++ return (uint8_t) __builtin_aarch64_reduc_plus_scal_v16qi ((int8x16_t) __a);
++}
++
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddvq_u16 (uint16x8_t __a)
++{
++ return (uint16_t) __builtin_aarch64_reduc_plus_scal_v8hi ((int16x8_t) __a);
++}
++
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddvq_u32 (uint32x4_t __a)
++{
++ return (uint32_t) __builtin_aarch64_reduc_plus_scal_v4si ((int32x4_t) __a);
++}
++
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddvq_u64 (uint64x2_t __a)
++{
++ return (uint64_t) __builtin_aarch64_reduc_plus_scal_v2di ((int64x2_t) __a);
++}
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddv_f32 (float32x2_t __a)
++{
++ return __builtin_aarch64_reduc_plus_scal_v2sf (__a);
++}
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddvq_f32 (float32x4_t __a)
++{
++ return __builtin_aarch64_reduc_plus_scal_v4sf (__a);
++}
++
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddvq_f64 (float64x2_t __a)
++{
++ return __builtin_aarch64_reduc_plus_scal_v2df (__a);
++}
++
++/* vbsl */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_f16 (uint16x4_t __a, float16x4_t __b, float16x4_t __c)
++{
++ return __builtin_aarch64_simd_bslv4hf_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_f32 (uint32x2_t __a, float32x2_t __b, float32x2_t __c)
++{
++ return __builtin_aarch64_simd_bslv2sf_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_f64 (uint64x1_t __a, float64x1_t __b, float64x1_t __c)
++{
++ return (float64x1_t)
++ { __builtin_aarch64_simd_bsldf_suss (__a[0], __b[0], __c[0]) };
++}
++
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_p8 (uint8x8_t __a, poly8x8_t __b, poly8x8_t __c)
++{
++ return __builtin_aarch64_simd_bslv8qi_pupp (__a, __b, __c);
++}
++
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_p16 (uint16x4_t __a, poly16x4_t __b, poly16x4_t __c)
++{
++ return __builtin_aarch64_simd_bslv4hi_pupp (__a, __b, __c);
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_s8 (uint8x8_t __a, int8x8_t __b, int8x8_t __c)
++{
++ return __builtin_aarch64_simd_bslv8qi_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_s16 (uint16x4_t __a, int16x4_t __b, int16x4_t __c)
++{
++ return __builtin_aarch64_simd_bslv4hi_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_s32 (uint32x2_t __a, int32x2_t __b, int32x2_t __c)
++{
++ return __builtin_aarch64_simd_bslv2si_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_s64 (uint64x1_t __a, int64x1_t __b, int64x1_t __c)
++{
++ return (int64x1_t)
++ {__builtin_aarch64_simd_bsldi_suss (__a[0], __b[0], __c[0])};
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
++{
++ return __builtin_aarch64_simd_bslv8qi_uuuu (__a, __b, __c);
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
++{
++ return __builtin_aarch64_simd_bslv4hi_uuuu (__a, __b, __c);
++}
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
++{
++ return __builtin_aarch64_simd_bslv2si_uuuu (__a, __b, __c);
++}
++
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbsl_u64 (uint64x1_t __a, uint64x1_t __b, uint64x1_t __c)
++{
++ return (uint64x1_t)
++ {__builtin_aarch64_simd_bsldi_uuuu (__a[0], __b[0], __c[0])};
++}
++
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_f16 (uint16x8_t __a, float16x8_t __b, float16x8_t __c)
++{
++ return __builtin_aarch64_simd_bslv8hf_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_f32 (uint32x4_t __a, float32x4_t __b, float32x4_t __c)
++{
++ return __builtin_aarch64_simd_bslv4sf_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_f64 (uint64x2_t __a, float64x2_t __b, float64x2_t __c)
++{
++ return __builtin_aarch64_simd_bslv2df_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_p8 (uint8x16_t __a, poly8x16_t __b, poly8x16_t __c)
++{
++ return __builtin_aarch64_simd_bslv16qi_pupp (__a, __b, __c);
++}
++
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_p16 (uint16x8_t __a, poly16x8_t __b, poly16x8_t __c)
++{
++ return __builtin_aarch64_simd_bslv8hi_pupp (__a, __b, __c);
++}
++
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_s8 (uint8x16_t __a, int8x16_t __b, int8x16_t __c)
++{
++ return __builtin_aarch64_simd_bslv16qi_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_s16 (uint16x8_t __a, int16x8_t __b, int16x8_t __c)
++{
++ return __builtin_aarch64_simd_bslv8hi_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_s32 (uint32x4_t __a, int32x4_t __b, int32x4_t __c)
++{
++ return __builtin_aarch64_simd_bslv4si_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_s64 (uint64x2_t __a, int64x2_t __b, int64x2_t __c)
++{
++ return __builtin_aarch64_simd_bslv2di_suss (__a, __b, __c);
++}
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_u8 (uint8x16_t __a, uint8x16_t __b, uint8x16_t __c)
++{
++ return __builtin_aarch64_simd_bslv16qi_uuuu (__a, __b, __c);
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_u16 (uint16x8_t __a, uint16x8_t __b, uint16x8_t __c)
++{
++ return __builtin_aarch64_simd_bslv8hi_uuuu (__a, __b, __c);
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_u32 (uint32x4_t __a, uint32x4_t __b, uint32x4_t __c)
++{
++ return __builtin_aarch64_simd_bslv4si_uuuu (__a, __b, __c);
++}
++
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vbslq_u64 (uint64x2_t __a, uint64x2_t __b, uint64x2_t __c)
++{
++ return __builtin_aarch64_simd_bslv2di_uuuu (__a, __b, __c);
++}
++
++/* ARMv8.1-A instrinsics. */
++#pragma GCC push_options
++#pragma GCC target ("arch=armv8.1-a")
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
++{
++ return __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
++{
++ return __builtin_aarch64_sqrdmlahv2si (__a, __b, __c);
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
++{
++ return __builtin_aarch64_sqrdmlahv8hi (__a, __b, __c);
++}
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
++{
++ return __builtin_aarch64_sqrdmlahv4si (__a, __b, __c);
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
++{
++ return __builtin_aarch64_sqrdmlshv4hi (__a, __b, __c);
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
++{
++ return __builtin_aarch64_sqrdmlshv2si (__a, __b, __c);
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
++{
++ return __builtin_aarch64_sqrdmlshv8hi (__a, __b, __c);
++}
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
++{
++ return __builtin_aarch64_sqrdmlshv4si (__a, __b, __c);
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlah_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlah_laneqv4hi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlah_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlah_laneqv2si (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlah_laneqv8hi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlah_laneqv4si (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlsh_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+ {
+- poly8x16_t result;
+- __asm__ ("mvn %0.16b,%1.16b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return __builtin_aarch64_sqrdmlsh_laneqv4hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vmvnq_s8 (int8x16_t a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlsh_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+ {
+- int8x16_t result;
+- __asm__ ("mvn %0.16b,%1.16b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return __builtin_aarch64_sqrdmlsh_laneqv2si (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmvnq_s16 (int16x8_t a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
+ {
+- int16x8_t result;
+- __asm__ ("mvn %0.16b,%1.16b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return __builtin_aarch64_sqrdmlsh_laneqv8hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmvnq_s32 (int32x4_t a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
+ {
+- int32x4_t result;
+- __asm__ ("mvn %0.16b,%1.16b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return __builtin_aarch64_sqrdmlsh_laneqv4si (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vmvnq_u8 (uint8x16_t a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlah_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+ {
+- uint8x16_t result;
+- __asm__ ("mvn %0.16b,%1.16b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return __builtin_aarch64_sqrdmlah_lanev4hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmvnq_u16 (uint16x8_t a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlah_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+ {
+- uint16x8_t result;
+- __asm__ ("mvn %0.16b,%1.16b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return __builtin_aarch64_sqrdmlah_lanev2si (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmvnq_u32 (uint32x4_t a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+ {
+- uint32x4_t result;
+- __asm__ ("mvn %0.16b,%1.16b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return __builtin_aarch64_sqrdmlah_lanev8hi (__a, __b, __c, __d);
+ }
+
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlah_lanev4si (__a, __b, __c, __d);
++}
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vpadal_s8 (int16x4_t a, int8x8_t b)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahh_s16 (int16_t __a, int16_t __b, int16_t __c)
+ {
+- int16x4_t result;
+- __asm__ ("sadalp %0.4h,%2.8b"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (int16_t) __builtin_aarch64_sqrdmlahhi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vpadal_s16 (int32x2_t a, int16x4_t b)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahh_lane_s16 (int16_t __a, int16_t __b, int16x4_t __c, const int __d)
+ {
+- int32x2_t result;
+- __asm__ ("sadalp %0.2s,%2.4h"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return __builtin_aarch64_sqrdmlah_lanehi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahh_laneq_s16 (int16_t __a, int16_t __b, int16x8_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlah_laneqhi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahs_s32 (int32_t __a, int32_t __b, int32_t __c)
++{
++ return (int32_t) __builtin_aarch64_sqrdmlahsi (__a, __b, __c);
++}
++
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahs_lane_s32 (int32_t __a, int32_t __b, int32x2_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlah_lanesi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlahs_laneq_s32 (int32_t __a, int32_t __b, int32x4_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlah_laneqsi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlsh_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlsh_lanev4hi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlsh_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlsh_lanev2si (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlsh_lanev8hi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlsh_lanev4si (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshh_s16 (int16_t __a, int16_t __b, int16_t __c)
++{
++ return (int16_t) __builtin_aarch64_sqrdmlshhi (__a, __b, __c);
++}
++
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshh_lane_s16 (int16_t __a, int16_t __b, int16x4_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlsh_lanehi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshh_laneq_s16 (int16_t __a, int16_t __b, int16x8_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlsh_laneqhi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshs_s32 (int32_t __a, int32_t __b, int32_t __c)
++{
++ return (int32_t) __builtin_aarch64_sqrdmlshsi (__a, __b, __c);
++}
++
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshs_lane_s32 (int32_t __a, int32_t __b, int32x2_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlsh_lanesi (__a, __b, __c, __d);
++}
++
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmlshs_laneq_s32 (int32_t __a, int32_t __b, int32x4_t __c, const int __d)
++{
++ return __builtin_aarch64_sqrdmlsh_laneqsi (__a, __b, __c, __d);
++}
++#pragma GCC pop_options
++
++#pragma GCC push_options
++#pragma GCC target ("+nothing+crypto")
++/* vaes */
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaeseq_u8 (uint8x16_t data, uint8x16_t key)
++{
++ return __builtin_aarch64_crypto_aesev16qi_uuu (data, key);
++}
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaesdq_u8 (uint8x16_t data, uint8x16_t key)
++{
++ return __builtin_aarch64_crypto_aesdv16qi_uuu (data, key);
++}
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaesmcq_u8 (uint8x16_t data)
++{
++ return __builtin_aarch64_crypto_aesmcv16qi_uu (data);
++}
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaesimcq_u8 (uint8x16_t data)
++{
++ return __builtin_aarch64_crypto_aesimcv16qi_uu (data);
++}
++#pragma GCC pop_options
++
++/* vcage */
++
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcage_f64 (float64x1_t __a, float64x1_t __b)
++{
++ return vabs_f64 (__a) >= vabs_f64 (__b);
++}
++
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcages_f32 (float32_t __a, float32_t __b)
++{
++ return __builtin_fabsf (__a) >= __builtin_fabsf (__b) ? -1 : 0;
++}
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcage_f32 (float32x2_t __a, float32x2_t __b)
++{
++ return vabs_f32 (__a) >= vabs_f32 (__b);
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcageq_f32 (float32x4_t __a, float32x4_t __b)
++{
++ return vabsq_f32 (__a) >= vabsq_f32 (__b);
++}
++
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcaged_f64 (float64_t __a, float64_t __b)
++{
++ return __builtin_fabs (__a) >= __builtin_fabs (__b) ? -1 : 0;
++}
++
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcageq_f64 (float64x2_t __a, float64x2_t __b)
++{
++ return vabsq_f64 (__a) >= vabsq_f64 (__b);
++}
++
++/* vcagt */
++
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcagts_f32 (float32_t __a, float32_t __b)
++{
++ return __builtin_fabsf (__a) > __builtin_fabsf (__b) ? -1 : 0;
++}
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcagt_f32 (float32x2_t __a, float32x2_t __b)
++{
++ return vabs_f32 (__a) > vabs_f32 (__b);
++}
++
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcagt_f64 (float64x1_t __a, float64x1_t __b)
++{
++ return vabs_f64 (__a) > vabs_f64 (__b);
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcagtq_f32 (float32x4_t __a, float32x4_t __b)
++{
++ return vabsq_f32 (__a) > vabsq_f32 (__b);
++}
++
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcagtd_f64 (float64_t __a, float64_t __b)
++{
++ return __builtin_fabs (__a) > __builtin_fabs (__b) ? -1 : 0;
++}
++
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcagtq_f64 (float64x2_t __a, float64x2_t __b)
++{
++ return vabsq_f64 (__a) > vabsq_f64 (__b);
++}
++
++/* vcale */
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcale_f32 (float32x2_t __a, float32x2_t __b)
++{
++ return vabs_f32 (__a) <= vabs_f32 (__b);
++}
++
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcale_f64 (float64x1_t __a, float64x1_t __b)
++{
++ return vabs_f64 (__a) <= vabs_f64 (__b);
++}
++
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcaled_f64 (float64_t __a, float64_t __b)
++{
++ return __builtin_fabs (__a) <= __builtin_fabs (__b) ? -1 : 0;
++}
++
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcales_f32 (float32_t __a, float32_t __b)
++{
++ return __builtin_fabsf (__a) <= __builtin_fabsf (__b) ? -1 : 0;
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcaleq_f32 (float32x4_t __a, float32x4_t __b)
++{
++ return vabsq_f32 (__a) <= vabsq_f32 (__b);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vpadal_s32 (int64x1_t a, int32x2_t b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcaleq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- int64x1_t result;
+- __asm__ ("sadalp %0.1d,%2.2s"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return vabsq_f64 (__a) <= vabsq_f64 (__b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vpadal_u8 (uint16x4_t a, uint8x8_t b)
++/* vcalt */
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcalt_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- uint16x4_t result;
+- __asm__ ("uadalp %0.4h,%2.8b"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return vabs_f32 (__a) < vabs_f32 (__b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vpadal_u16 (uint32x2_t a, uint16x4_t b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcalt_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- uint32x2_t result;
+- __asm__ ("uadalp %0.2s,%2.4h"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return vabs_f64 (__a) < vabs_f64 (__b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vpadal_u32 (uint64x1_t a, uint32x2_t b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcaltd_f64 (float64_t __a, float64_t __b)
+ {
+- uint64x1_t result;
+- __asm__ ("uadalp %0.1d,%2.2s"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return __builtin_fabs (__a) < __builtin_fabs (__b) ? -1 : 0;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vpadalq_s8 (int16x8_t a, int8x16_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcaltq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- int16x8_t result;
+- __asm__ ("sadalp %0.8h,%2.16b"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return vabsq_f32 (__a) < vabsq_f32 (__b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vpadalq_s16 (int32x4_t a, int16x8_t b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcaltq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- int32x4_t result;
+- __asm__ ("sadalp %0.4s,%2.8h"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return vabsq_f64 (__a) < vabsq_f64 (__b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vpadalq_s32 (int64x2_t a, int32x4_t b)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcalts_f32 (float32_t __a, float32_t __b)
+ {
+- int64x2_t result;
+- __asm__ ("sadalp %0.2d,%2.4s"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return __builtin_fabsf (__a) < __builtin_fabsf (__b) ? -1 : 0;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vpadalq_u8 (uint16x8_t a, uint8x16_t b)
++/* vceq - vector. */
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- uint16x8_t result;
+- __asm__ ("uadalp %0.8h,%2.16b"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint32x2_t) (__a == __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vpadalq_u16 (uint32x4_t a, uint16x8_t b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- uint32x4_t result;
+- __asm__ ("uadalp %0.4s,%2.8h"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint64x1_t) (__a == __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vpadalq_u32 (uint64x2_t a, uint32x4_t b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_p8 (poly8x8_t __a, poly8x8_t __b)
+ {
+- uint64x2_t result;
+- __asm__ ("uadalp %0.2d,%2.4s"
+- : "=w"(result)
+- : "0"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint8x8_t) (__a == __b);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vpadd_f32 (float32x2_t a, float32x2_t b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- float32x2_t result;
+- __asm__ ("faddp %0.2s,%1.2s,%2.2s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint8x8_t) (__a == __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vpaddl_s8 (int8x8_t a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- int16x4_t result;
+- __asm__ ("saddlp %0.4h,%1.8b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint16x4_t) (__a == __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vpaddl_s16 (int16x4_t a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- int32x2_t result;
+- __asm__ ("saddlp %0.2s,%1.4h"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint32x2_t) (__a == __b);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vpaddl_s32 (int32x2_t a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_s64 (int64x1_t __a, int64x1_t __b)
+ {
+- int64x1_t result;
+- __asm__ ("saddlp %0.1d,%1.2s"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint64x1_t) (__a == __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vpaddl_u8 (uint8x8_t a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- uint16x4_t result;
+- __asm__ ("uaddlp %0.4h,%1.8b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (__a == __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vpaddl_u16 (uint16x4_t a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+- uint32x2_t result;
+- __asm__ ("uaddlp %0.2s,%1.4h"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (__a == __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vpaddl_u32 (uint32x2_t a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- uint64x1_t result;
+- __asm__ ("uaddlp %0.1d,%1.2s"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (__a == __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vpaddlq_s8 (int8x16_t a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+- int16x8_t result;
+- __asm__ ("saddlp %0.8h,%1.16b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (__a == __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vpaddlq_s16 (int16x8_t a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- int32x4_t result;
+- __asm__ ("saddlp %0.4s,%1.8h"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint32x4_t) (__a == __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vpaddlq_s32 (int32x4_t a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- int64x2_t result;
+- __asm__ ("saddlp %0.2d,%1.4s"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint64x2_t) (__a == __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vpaddlq_u8 (uint8x16_t a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_p8 (poly8x16_t __a, poly8x16_t __b)
+ {
+- uint16x8_t result;
+- __asm__ ("uaddlp %0.8h,%1.16b"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint8x16_t) (__a == __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vpaddlq_u16 (uint16x8_t a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- uint32x4_t result;
+- __asm__ ("uaddlp %0.4s,%1.8h"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint8x16_t) (__a == __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vpaddlq_u32 (uint32x4_t a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- uint64x2_t result;
+- __asm__ ("uaddlp %0.2d,%1.4s"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint16x8_t) (__a == __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vpaddq_f32 (float32x4_t a, float32x4_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- float32x4_t result;
+- __asm__ ("faddp %0.4s,%1.4s,%2.4s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint32x4_t) (__a == __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vpaddq_f64 (float64x2_t a, float64x2_t b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- float64x2_t result;
+- __asm__ ("faddp %0.2d,%1.2d,%2.2d"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint64x2_t) (__a == __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vpaddq_s8 (int8x16_t a, int8x16_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+- int8x16_t result;
+- __asm__ ("addp %0.16b,%1.16b,%2.16b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a == __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vpaddq_s16 (int16x8_t a, int16x8_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+- int16x8_t result;
+- __asm__ ("addp %0.8h,%1.8h,%2.8h"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a == __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vpaddq_s32 (int32x4_t a, int32x4_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+- int32x4_t result;
+- __asm__ ("addp %0.4s,%1.4s,%2.4s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a == __b);
++}
++
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_u64 (uint64x2_t __a, uint64x2_t __b)
++{
++ return (__a == __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vpaddq_s64 (int64x2_t a, int64x2_t b)
++/* vceq - scalar. */
++
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqs_f32 (float32_t __a, float32_t __b)
+ {
+- int64x2_t result;
+- __asm__ ("addp %0.2d,%1.2d,%2.2d"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return __a == __b ? -1 : 0;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vpaddq_u8 (uint8x16_t a, uint8x16_t b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqd_s64 (int64_t __a, int64_t __b)
+ {
+- uint8x16_t result;
+- __asm__ ("addp %0.16b,%1.16b,%2.16b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return __a == __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vpaddq_u16 (uint16x8_t a, uint16x8_t b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqd_u64 (uint64_t __a, uint64_t __b)
+ {
+- uint16x8_t result;
+- __asm__ ("addp %0.8h,%1.8h,%2.8h"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return __a == __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vpaddq_u32 (uint32x4_t a, uint32x4_t b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqd_f64 (float64_t __a, float64_t __b)
+ {
+- uint32x4_t result;
+- __asm__ ("addp %0.4s,%1.4s,%2.4s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return __a == __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vpaddq_u64 (uint64x2_t a, uint64x2_t b)
++/* vceqz - vector. */
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_f32 (float32x2_t __a)
+ {
+- uint64x2_t result;
+- __asm__ ("addp %0.2d,%1.2d,%2.2d"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint32x2_t) (__a == 0.0f);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vpadds_f32 (float32x2_t a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_f64 (float64x1_t __a)
+ {
+- float32_t result;
+- __asm__ ("faddp %s0,%1.2s"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint64x1_t) (__a == (float64x1_t) {0.0});
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqdmulh_n_s16 (int16x4_t a, int16_t b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_p8 (poly8x8_t __a)
+ {
+- int16x4_t result;
+- __asm__ ("sqdmulh %0.4h,%1.4h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint8x8_t) (__a == 0);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqdmulh_n_s32 (int32x2_t a, int32_t b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_s8 (int8x8_t __a)
+ {
+- int32x2_t result;
+- __asm__ ("sqdmulh %0.2s,%1.2s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint8x8_t) (__a == 0);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqdmulhq_n_s16 (int16x8_t a, int16_t b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_s16 (int16x4_t __a)
+ {
+- int16x8_t result;
+- __asm__ ("sqdmulh %0.8h,%1.8h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint16x4_t) (__a == 0);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmulhq_n_s32 (int32x4_t a, int32_t b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_s32 (int32x2_t __a)
+ {
+- int32x4_t result;
+- __asm__ ("sqdmulh %0.4s,%1.4s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint32x2_t) (__a == 0);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqmovn_high_s16 (int8x8_t a, int16x8_t b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_s64 (int64x1_t __a)
+ {
+- int8x16_t result = vcombine_s8 (a, vcreate_s8 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("sqxtn2 %0.16b, %1.8h"
+- : "+w"(result)
+- : "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint64x1_t) (__a == __AARCH64_INT64_C (0));
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqmovn_high_s32 (int16x4_t a, int32x4_t b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_u8 (uint8x8_t __a)
+ {
+- int16x8_t result = vcombine_s16 (a, vcreate_s16 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("sqxtn2 %0.8h, %1.4s"
+- : "+w"(result)
+- : "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a == 0);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqmovn_high_s64 (int32x2_t a, int64x2_t b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_u16 (uint16x4_t __a)
+ {
+- int32x4_t result = vcombine_s32 (a, vcreate_s32 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("sqxtn2 %0.4s, %1.2d"
+- : "+w"(result)
+- : "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a == 0);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqmovn_high_u16 (uint8x8_t a, uint16x8_t b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_u32 (uint32x2_t __a)
+ {
+- uint8x16_t result = vcombine_u8 (a, vcreate_u8 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("uqxtn2 %0.16b, %1.8h"
+- : "+w"(result)
+- : "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a == 0);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vqmovn_high_u32 (uint16x4_t a, uint32x4_t b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_u64 (uint64x1_t __a)
+ {
+- uint16x8_t result = vcombine_u16 (a, vcreate_u16 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("uqxtn2 %0.8h, %1.4s"
+- : "+w"(result)
+- : "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a == __AARCH64_UINT64_C (0));
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vqmovn_high_u64 (uint32x2_t a, uint64x2_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_f32 (float32x4_t __a)
+ {
+- uint32x4_t result = vcombine_u32 (a, vcreate_u32 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("uqxtn2 %0.4s, %1.2d"
+- : "+w"(result)
+- : "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint32x4_t) (__a == 0.0f);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqmovun_high_s16 (uint8x8_t a, int16x8_t b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_f64 (float64x2_t __a)
+ {
+- uint8x16_t result = vcombine_u8 (a, vcreate_u8 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("sqxtun2 %0.16b, %1.8h"
+- : "+w"(result)
+- : "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint64x2_t) (__a == 0.0f);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vqmovun_high_s32 (uint16x4_t a, int32x4_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_p8 (poly8x16_t __a)
+ {
+- uint16x8_t result = vcombine_u16 (a, vcreate_u16 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("sqxtun2 %0.8h, %1.4s"
+- : "+w"(result)
+- : "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint8x16_t) (__a == 0);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vqmovun_high_s64 (uint32x2_t a, int64x2_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_s8 (int8x16_t __a)
+ {
+- uint32x4_t result = vcombine_u32 (a, vcreate_u32 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("sqxtun2 %0.4s, %1.2d"
+- : "+w"(result)
+- : "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint8x16_t) (__a == 0);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrdmulh_n_s16 (int16x4_t a, int16_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_s16 (int16x8_t __a)
+ {
+- int16x4_t result;
+- __asm__ ("sqrdmulh %0.4h,%1.4h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint16x8_t) (__a == 0);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrdmulh_n_s32 (int32x2_t a, int32_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_s32 (int32x4_t __a)
+ {
+- int32x2_t result;
+- __asm__ ("sqrdmulh %0.2s,%1.2s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint32x4_t) (__a == 0);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqrdmulhq_n_s16 (int16x8_t a, int16_t b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_s64 (int64x2_t __a)
+ {
+- int16x8_t result;
+- __asm__ ("sqrdmulh %0.8h,%1.8h,%2.h[0]"
+- : "=w"(result)
+- : "w"(a), "x"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint64x2_t) (__a == __AARCH64_INT64_C (0));
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqrdmulhq_n_s32 (int32x4_t a, int32_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_u8 (uint8x16_t __a)
+ {
+- int32x4_t result;
+- __asm__ ("sqrdmulh %0.4s,%1.4s,%2.s[0]"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a == 0);
+ }
+
+-#define vqrshrn_high_n_s16(a, b, c) \
+- __extension__ \
+- ({ \
+- int16x8_t b_ = (b); \
+- int8x8_t a_ = (a); \
+- int8x16_t result = vcombine_s8 \
+- (a_, vcreate_s8 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqrshrn2 %0.16b, %1.8h, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vqrshrn_high_n_s32(a, b, c) \
+- __extension__ \
+- ({ \
+- int32x4_t b_ = (b); \
+- int16x4_t a_ = (a); \
+- int16x8_t result = vcombine_s16 \
+- (a_, vcreate_s16 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqrshrn2 %0.8h, %1.4s, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vqrshrn_high_n_s64(a, b, c) \
+- __extension__ \
+- ({ \
+- int64x2_t b_ = (b); \
+- int32x2_t a_ = (a); \
+- int32x4_t result = vcombine_s32 \
+- (a_, vcreate_s32 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqrshrn2 %0.4s, %1.2d, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_u16 (uint16x8_t __a)
++{
++ return (__a == 0);
++}
+
+-#define vqrshrn_high_n_u16(a, b, c) \
+- __extension__ \
+- ({ \
+- uint16x8_t b_ = (b); \
+- uint8x8_t a_ = (a); \
+- uint8x16_t result = vcombine_u8 \
+- (a_, vcreate_u8 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("uqrshrn2 %0.16b, %1.8h, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_u32 (uint32x4_t __a)
++{
++ return (__a == 0);
++}
+
+-#define vqrshrn_high_n_u32(a, b, c) \
+- __extension__ \
+- ({ \
+- uint32x4_t b_ = (b); \
+- uint16x4_t a_ = (a); \
+- uint16x8_t result = vcombine_u16 \
+- (a_, vcreate_u16 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("uqrshrn2 %0.8h, %1.4s, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_u64 (uint64x2_t __a)
++{
++ return (__a == __AARCH64_UINT64_C (0));
++}
+
+-#define vqrshrn_high_n_u64(a, b, c) \
+- __extension__ \
+- ({ \
+- uint64x2_t b_ = (b); \
+- uint32x2_t a_ = (a); \
+- uint32x4_t result = vcombine_u32 \
+- (a_, vcreate_u32 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("uqrshrn2 %0.4s, %1.2d, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++/* vceqz - scalar. */
+
+-#define vqrshrun_high_n_s16(a, b, c) \
+- __extension__ \
+- ({ \
+- int16x8_t b_ = (b); \
+- uint8x8_t a_ = (a); \
+- uint8x16_t result = vcombine_u8 \
+- (a_, vcreate_u8 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqrshrun2 %0.16b, %1.8h, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzs_f32 (float32_t __a)
++{
++ return __a == 0.0f ? -1 : 0;
++}
+
+-#define vqrshrun_high_n_s32(a, b, c) \
+- __extension__ \
+- ({ \
+- int32x4_t b_ = (b); \
+- uint16x4_t a_ = (a); \
+- uint16x8_t result = vcombine_u16 \
+- (a_, vcreate_u16 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqrshrun2 %0.8h, %1.4s, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzd_s64 (int64_t __a)
++{
++ return __a == 0 ? -1ll : 0ll;
++}
+
+-#define vqrshrun_high_n_s64(a, b, c) \
+- __extension__ \
+- ({ \
+- int64x2_t b_ = (b); \
+- uint32x2_t a_ = (a); \
+- uint32x4_t result = vcombine_u32 \
+- (a_, vcreate_u32 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqrshrun2 %0.4s, %1.2d, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzd_u64 (uint64_t __a)
++{
++ return __a == 0 ? -1ll : 0ll;
++}
+
+-#define vqshrn_high_n_s16(a, b, c) \
+- __extension__ \
+- ({ \
+- int16x8_t b_ = (b); \
+- int8x8_t a_ = (a); \
+- int8x16_t result = vcombine_s8 \
+- (a_, vcreate_s8 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqshrn2 %0.16b, %1.8h, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzd_f64 (float64_t __a)
++{
++ return __a == 0.0 ? -1ll : 0ll;
++}
+
+-#define vqshrn_high_n_s32(a, b, c) \
+- __extension__ \
+- ({ \
+- int32x4_t b_ = (b); \
+- int16x4_t a_ = (a); \
+- int16x8_t result = vcombine_s16 \
+- (a_, vcreate_s16 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqshrn2 %0.8h, %1.4s, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++/* vcge - vector. */
+
+-#define vqshrn_high_n_s64(a, b, c) \
+- __extension__ \
+- ({ \
+- int64x2_t b_ = (b); \
+- int32x2_t a_ = (a); \
+- int32x4_t result = vcombine_s32 \
+- (a_, vcreate_s32 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqshrn2 %0.4s, %1.2d, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_f32 (float32x2_t __a, float32x2_t __b)
++{
++ return (uint32x2_t) (__a >= __b);
++}
+
+-#define vqshrn_high_n_u16(a, b, c) \
+- __extension__ \
+- ({ \
+- uint16x8_t b_ = (b); \
+- uint8x8_t a_ = (a); \
+- uint8x16_t result = vcombine_u8 \
+- (a_, vcreate_u8 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("uqshrn2 %0.16b, %1.8h, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_f64 (float64x1_t __a, float64x1_t __b)
++{
++ return (uint64x1_t) (__a >= __b);
++}
+
+-#define vqshrn_high_n_u32(a, b, c) \
+- __extension__ \
+- ({ \
+- uint32x4_t b_ = (b); \
+- uint16x4_t a_ = (a); \
+- uint16x8_t result = vcombine_u16 \
+- (a_, vcreate_u16 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("uqshrn2 %0.8h, %1.4s, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_s8 (int8x8_t __a, int8x8_t __b)
++{
++ return (uint8x8_t) (__a >= __b);
++}
+
+-#define vqshrn_high_n_u64(a, b, c) \
+- __extension__ \
+- ({ \
+- uint64x2_t b_ = (b); \
+- uint32x2_t a_ = (a); \
+- uint32x4_t result = vcombine_u32 \
+- (a_, vcreate_u32 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("uqshrn2 %0.4s, %1.2d, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_s16 (int16x4_t __a, int16x4_t __b)
++{
++ return (uint16x4_t) (__a >= __b);
++}
+
+-#define vqshrun_high_n_s16(a, b, c) \
+- __extension__ \
+- ({ \
+- int16x8_t b_ = (b); \
+- uint8x8_t a_ = (a); \
+- uint8x16_t result = vcombine_u8 \
+- (a_, vcreate_u8 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqshrun2 %0.16b, %1.8h, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_s32 (int32x2_t __a, int32x2_t __b)
++{
++ return (uint32x2_t) (__a >= __b);
++}
+
+-#define vqshrun_high_n_s32(a, b, c) \
+- __extension__ \
+- ({ \
+- int32x4_t b_ = (b); \
+- uint16x4_t a_ = (a); \
+- uint16x8_t result = vcombine_u16 \
+- (a_, vcreate_u16 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqshrun2 %0.8h, %1.4s, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_s64 (int64x1_t __a, int64x1_t __b)
++{
++ return (uint64x1_t) (__a >= __b);
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_u8 (uint8x8_t __a, uint8x8_t __b)
++{
++ return (__a >= __b);
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_u16 (uint16x4_t __a, uint16x4_t __b)
++{
++ return (__a >= __b);
++}
+
+-#define vqshrun_high_n_s64(a, b, c) \
+- __extension__ \
+- ({ \
+- int64x2_t b_ = (b); \
+- uint32x2_t a_ = (a); \
+- uint32x4_t result = vcombine_u32 \
+- (a_, vcreate_u32 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("sqshrun2 %0.4s, %1.2d, #%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_u32 (uint32x2_t __a, uint32x2_t __b)
++{
++ return (__a >= __b);
++}
+
+-#define vrshrn_high_n_s16(a, b, c) \
+- __extension__ \
+- ({ \
+- int16x8_t b_ = (b); \
+- int8x8_t a_ = (a); \
+- int8x16_t result = vcombine_s8 \
+- (a_, vcreate_s8 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("rshrn2 %0.16b,%1.8h,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_u64 (uint64x1_t __a, uint64x1_t __b)
++{
++ return (__a >= __b);
++}
+
+-#define vrshrn_high_n_s32(a, b, c) \
+- __extension__ \
+- ({ \
+- int32x4_t b_ = (b); \
+- int16x4_t a_ = (a); \
+- int16x8_t result = vcombine_s16 \
+- (a_, vcreate_s16 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("rshrn2 %0.8h,%1.4s,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_f32 (float32x4_t __a, float32x4_t __b)
++{
++ return (uint32x4_t) (__a >= __b);
++}
+
+-#define vrshrn_high_n_s64(a, b, c) \
+- __extension__ \
+- ({ \
+- int64x2_t b_ = (b); \
+- int32x2_t a_ = (a); \
+- int32x4_t result = vcombine_s32 \
+- (a_, vcreate_s32 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("rshrn2 %0.4s,%1.2d,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_f64 (float64x2_t __a, float64x2_t __b)
++{
++ return (uint64x2_t) (__a >= __b);
++}
+
+-#define vrshrn_high_n_u16(a, b, c) \
+- __extension__ \
+- ({ \
+- uint16x8_t b_ = (b); \
+- uint8x8_t a_ = (a); \
+- uint8x16_t result = vcombine_u8 \
+- (a_, vcreate_u8 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("rshrn2 %0.16b,%1.8h,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_s8 (int8x16_t __a, int8x16_t __b)
++{
++ return (uint8x16_t) (__a >= __b);
++}
+
+-#define vrshrn_high_n_u32(a, b, c) \
+- __extension__ \
+- ({ \
+- uint32x4_t b_ = (b); \
+- uint16x4_t a_ = (a); \
+- uint16x8_t result = vcombine_u16 \
+- (a_, vcreate_u16 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("rshrn2 %0.8h,%1.4s,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_s16 (int16x8_t __a, int16x8_t __b)
++{
++ return (uint16x8_t) (__a >= __b);
++}
+
+-#define vrshrn_high_n_u64(a, b, c) \
+- __extension__ \
+- ({ \
+- uint64x2_t b_ = (b); \
+- uint32x2_t a_ = (a); \
+- uint32x4_t result = vcombine_u32 \
+- (a_, vcreate_u32 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("rshrn2 %0.4s,%1.2d,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_s32 (int32x4_t __a, int32x4_t __b)
++{
++ return (uint32x4_t) (__a >= __b);
++}
+
+-#define vrshrn_n_s16(a, b) \
+- __extension__ \
+- ({ \
+- int16x8_t a_ = (a); \
+- int8x8_t result; \
+- __asm__ ("rshrn %0.8b,%1.8h,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_s64 (int64x2_t __a, int64x2_t __b)
++{
++ return (uint64x2_t) (__a >= __b);
++}
+
+-#define vrshrn_n_s32(a, b) \
+- __extension__ \
+- ({ \
+- int32x4_t a_ = (a); \
+- int16x4_t result; \
+- __asm__ ("rshrn %0.4h,%1.4s,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_u8 (uint8x16_t __a, uint8x16_t __b)
++{
++ return (__a >= __b);
++}
+
+-#define vrshrn_n_s64(a, b) \
+- __extension__ \
+- ({ \
+- int64x2_t a_ = (a); \
+- int32x2_t result; \
+- __asm__ ("rshrn %0.2s,%1.2d,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_u16 (uint16x8_t __a, uint16x8_t __b)
++{
++ return (__a >= __b);
++}
+
+-#define vrshrn_n_u16(a, b) \
+- __extension__ \
+- ({ \
+- uint16x8_t a_ = (a); \
+- uint8x8_t result; \
+- __asm__ ("rshrn %0.8b,%1.8h,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_u32 (uint32x4_t __a, uint32x4_t __b)
++{
++ return (__a >= __b);
++}
+
+-#define vrshrn_n_u32(a, b) \
+- __extension__ \
+- ({ \
+- uint32x4_t a_ = (a); \
+- uint16x4_t result; \
+- __asm__ ("rshrn %0.4h,%1.4s,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_u64 (uint64x2_t __a, uint64x2_t __b)
++{
++ return (__a >= __b);
++}
+
+-#define vrshrn_n_u64(a, b) \
+- __extension__ \
+- ({ \
+- uint64x2_t a_ = (a); \
+- uint32x2_t result; \
+- __asm__ ("rshrn %0.2s,%1.2d,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++/* vcge - scalar. */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrsqrte_f32 (float32x2_t a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcges_f32 (float32_t __a, float32_t __b)
+ {
+- float32x2_t result;
+- __asm__ ("frsqrte %0.2s,%1.2s"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return __a >= __b ? -1 : 0;
++}
++
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcged_s64 (int64_t __a, int64_t __b)
++{
++ return __a >= __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vrsqrte_f64 (float64x1_t a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcged_u64 (uint64_t __a, uint64_t __b)
+ {
+- float64x1_t result;
+- __asm__ ("frsqrte %d0,%d1"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return __a >= __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vrsqrte_u32 (uint32x2_t a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcged_f64 (float64_t __a, float64_t __b)
+ {
+- uint32x2_t result;
+- __asm__ ("ursqrte %0.2s,%1.2s"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return __a >= __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vrsqrted_f64 (float64_t a)
++/* vcgez - vector. */
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgez_f32 (float32x2_t __a)
+ {
+- float64_t result;
+- __asm__ ("frsqrte %d0,%d1"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint32x2_t) (__a >= 0.0f);
++}
++
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgez_f64 (float64x1_t __a)
++{
++ return (uint64x1_t) (__a[0] >= (float64x1_t) {0.0});
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrsqrteq_f32 (float32x4_t a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgez_s8 (int8x8_t __a)
+ {
+- float32x4_t result;
+- __asm__ ("frsqrte %0.4s,%1.4s"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint8x8_t) (__a >= 0);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrsqrteq_f64 (float64x2_t a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgez_s16 (int16x4_t __a)
+ {
+- float64x2_t result;
+- __asm__ ("frsqrte %0.2d,%1.2d"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint16x4_t) (__a >= 0);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vrsqrteq_u32 (uint32x4_t a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgez_s32 (int32x2_t __a)
+ {
+- uint32x4_t result;
+- __asm__ ("ursqrte %0.4s,%1.4s"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint32x2_t) (__a >= 0);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vrsqrtes_f32 (float32_t a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgez_s64 (int64x1_t __a)
+ {
+- float32_t result;
+- __asm__ ("frsqrte %s0,%s1"
+- : "=w"(result)
+- : "w"(a)
+- : /* No clobbers */);
+- return result;
++ return (uint64x1_t) (__a >= __AARCH64_INT64_C (0));
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrsqrts_f32 (float32x2_t a, float32x2_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgezq_f32 (float32x4_t __a)
+ {
+- float32x2_t result;
+- __asm__ ("frsqrts %0.2s,%1.2s,%2.2s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint32x4_t) (__a >= 0.0f);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vrsqrtsd_f64 (float64_t a, float64_t b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgezq_f64 (float64x2_t __a)
+ {
+- float64_t result;
+- __asm__ ("frsqrts %d0,%d1,%d2"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint64x2_t) (__a >= 0.0);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrsqrtsq_f32 (float32x4_t a, float32x4_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgezq_s8 (int8x16_t __a)
+ {
+- float32x4_t result;
+- __asm__ ("frsqrts %0.4s,%1.4s,%2.4s"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint8x16_t) (__a >= 0);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrsqrtsq_f64 (float64x2_t a, float64x2_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgezq_s16 (int16x8_t __a)
+ {
+- float64x2_t result;
+- __asm__ ("frsqrts %0.2d,%1.2d,%2.2d"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint16x8_t) (__a >= 0);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vrsqrtss_f32 (float32_t a, float32_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgezq_s32 (int32x4_t __a)
+ {
+- float32_t result;
+- __asm__ ("frsqrts %s0,%s1,%s2"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint32x4_t) (__a >= 0);
+ }
+
+-#define vshrn_high_n_s16(a, b, c) \
+- __extension__ \
+- ({ \
+- int16x8_t b_ = (b); \
+- int8x8_t a_ = (a); \
+- int8x16_t result = vcombine_s8 \
+- (a_, vcreate_s8 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("shrn2 %0.16b,%1.8h,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_high_n_s32(a, b, c) \
+- __extension__ \
+- ({ \
+- int32x4_t b_ = (b); \
+- int16x4_t a_ = (a); \
+- int16x8_t result = vcombine_s16 \
+- (a_, vcreate_s16 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("shrn2 %0.8h,%1.4s,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_high_n_s64(a, b, c) \
+- __extension__ \
+- ({ \
+- int64x2_t b_ = (b); \
+- int32x2_t a_ = (a); \
+- int32x4_t result = vcombine_s32 \
+- (a_, vcreate_s32 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("shrn2 %0.4s,%1.2d,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_high_n_u16(a, b, c) \
+- __extension__ \
+- ({ \
+- uint16x8_t b_ = (b); \
+- uint8x8_t a_ = (a); \
+- uint8x16_t result = vcombine_u8 \
+- (a_, vcreate_u8 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("shrn2 %0.16b,%1.8h,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_high_n_u32(a, b, c) \
+- __extension__ \
+- ({ \
+- uint32x4_t b_ = (b); \
+- uint16x4_t a_ = (a); \
+- uint16x8_t result = vcombine_u16 \
+- (a_, vcreate_u16 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("shrn2 %0.8h,%1.4s,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_high_n_u64(a, b, c) \
+- __extension__ \
+- ({ \
+- uint64x2_t b_ = (b); \
+- uint32x2_t a_ = (a); \
+- uint32x4_t result = vcombine_u32 \
+- (a_, vcreate_u32 \
+- (__AARCH64_UINT64_C (0x0))); \
+- __asm__ ("shrn2 %0.4s,%1.2d,#%2" \
+- : "+w"(result) \
+- : "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_n_s16(a, b) \
+- __extension__ \
+- ({ \
+- int16x8_t a_ = (a); \
+- int8x8_t result; \
+- __asm__ ("shrn %0.8b,%1.8h,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_n_s32(a, b) \
+- __extension__ \
+- ({ \
+- int32x4_t a_ = (a); \
+- int16x4_t result; \
+- __asm__ ("shrn %0.4h,%1.4s,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_n_s64(a, b) \
+- __extension__ \
+- ({ \
+- int64x2_t a_ = (a); \
+- int32x2_t result; \
+- __asm__ ("shrn %0.2s,%1.2d,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_n_u16(a, b) \
+- __extension__ \
+- ({ \
+- uint16x8_t a_ = (a); \
+- uint8x8_t result; \
+- __asm__ ("shrn %0.8b,%1.8h,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_n_u32(a, b) \
+- __extension__ \
+- ({ \
+- uint32x4_t a_ = (a); \
+- uint16x4_t result; \
+- __asm__ ("shrn %0.4h,%1.4s,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
+-
+-#define vshrn_n_u64(a, b) \
+- __extension__ \
+- ({ \
+- uint64x2_t a_ = (a); \
+- uint32x2_t result; \
+- __asm__ ("shrn %0.2s,%1.2d,%2" \
+- : "=w"(result) \
+- : "w"(a_), "i"(b) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgezq_s64 (int64x2_t __a)
++{
++ return (uint64x2_t) (__a >= __AARCH64_INT64_C (0));
++}
+
+-#define vsli_n_p8(a, b, c) \
+- __extension__ \
+- ({ \
+- poly8x8_t b_ = (b); \
+- poly8x8_t a_ = (a); \
+- poly8x8_t result; \
+- __asm__ ("sli %0.8b,%2.8b,%3" \
+- : "=w"(result) \
+- : "0"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++/* vcgez - scalar. */
+
+-#define vsli_n_p16(a, b, c) \
+- __extension__ \
+- ({ \
+- poly16x4_t b_ = (b); \
+- poly16x4_t a_ = (a); \
+- poly16x4_t result; \
+- __asm__ ("sli %0.4h,%2.4h,%3" \
+- : "=w"(result) \
+- : "0"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgezs_f32 (float32_t __a)
++{
++ return __a >= 0.0f ? -1 : 0;
++}
+
+-#define vsliq_n_p8(a, b, c) \
+- __extension__ \
+- ({ \
+- poly8x16_t b_ = (b); \
+- poly8x16_t a_ = (a); \
+- poly8x16_t result; \
+- __asm__ ("sli %0.16b,%2.16b,%3" \
+- : "=w"(result) \
+- : "0"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgezd_s64 (int64_t __a)
++{
++ return __a >= 0 ? -1ll : 0ll;
++}
+
+-#define vsliq_n_p16(a, b, c) \
+- __extension__ \
+- ({ \
+- poly16x8_t b_ = (b); \
+- poly16x8_t a_ = (a); \
+- poly16x8_t result; \
+- __asm__ ("sli %0.8h,%2.8h,%3" \
+- : "=w"(result) \
+- : "0"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgezd_f64 (float64_t __a)
++{
++ return __a >= 0.0 ? -1ll : 0ll;
++}
+
+-#define vsri_n_p8(a, b, c) \
+- __extension__ \
+- ({ \
+- poly8x8_t b_ = (b); \
+- poly8x8_t a_ = (a); \
+- poly8x8_t result; \
+- __asm__ ("sri %0.8b,%2.8b,%3" \
+- : "=w"(result) \
+- : "0"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++/* vcgt - vector. */
+
+-#define vsri_n_p16(a, b, c) \
+- __extension__ \
+- ({ \
+- poly16x4_t b_ = (b); \
+- poly16x4_t a_ = (a); \
+- poly16x4_t result; \
+- __asm__ ("sri %0.4h,%2.4h,%3" \
+- : "=w"(result) \
+- : "0"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_f32 (float32x2_t __a, float32x2_t __b)
++{
++ return (uint32x2_t) (__a > __b);
++}
+
+-#define vsriq_n_p8(a, b, c) \
+- __extension__ \
+- ({ \
+- poly8x16_t b_ = (b); \
+- poly8x16_t a_ = (a); \
+- poly8x16_t result; \
+- __asm__ ("sri %0.16b,%2.16b,%3" \
+- : "=w"(result) \
+- : "0"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
+- })
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_f64 (float64x1_t __a, float64x1_t __b)
++{
++ return (uint64x1_t) (__a > __b);
++}
+
+-#define vsriq_n_p16(a, b, c) \
+- __extension__ \
+- ({ \
+- poly16x8_t b_ = (b); \
+- poly16x8_t a_ = (a); \
+- poly16x8_t result; \
+- __asm__ ("sri %0.8h,%2.8h,%3" \
+- : "=w"(result) \
+- : "0"(a_), "w"(b_), "i"(c) \
+- : /* No clobbers */); \
+- result; \
- })
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_s8 (int8x8_t __a, int8x8_t __b)
++{
++ return (uint8x8_t) (__a > __b);
++}
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtst_p8 (poly8x8_t a, poly8x8_t b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- uint8x8_t result;
+- __asm__ ("cmtst %0.8b, %1.8b, %2.8b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint16x4_t) (__a > __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vtst_p16 (poly16x4_t a, poly16x4_t b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- uint16x4_t result;
+- __asm__ ("cmtst %0.4h, %1.4h, %2.4h"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint32x2_t) (__a > __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vtstq_p8 (poly8x16_t a, poly8x16_t b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_s64 (int64x1_t __a, int64x1_t __b)
+ {
+- uint8x16_t result;
+- __asm__ ("cmtst %0.16b, %1.16b, %2.16b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint64x1_t) (__a > __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vtstq_p16 (poly16x8_t a, poly16x8_t b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- uint16x8_t result;
+- __asm__ ("cmtst %0.8h, %1.8h, %2.8h"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a > __b);
+ }
+
+-/* End of temporary inline asm implementations. */
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_u16 (uint16x4_t __a, uint16x4_t __b)
++{
++ return (__a > __b);
++}
+
+-/* Start of temporary inline asm for vldn, vstn and friends. */
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_u32 (uint32x2_t __a, uint32x2_t __b)
++{
++ return (__a > __b);
++}
+
+-/* Create struct element types for duplicating loads.
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_u64 (uint64x1_t __a, uint64x1_t __b)
++{
++ return (__a > __b);
++}
+
+- Create 2 element structures of:
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_f32 (float32x4_t __a, float32x4_t __b)
++{
++ return (uint32x4_t) (__a > __b);
++}
+
+- +------+----+----+----+----+
+- | | 8 | 16 | 32 | 64 |
+- +------+----+----+----+----+
+- |int | Y | Y | N | N |
+- +------+----+----+----+----+
+- |uint | Y | Y | N | N |
+- +------+----+----+----+----+
+- |float | - | Y | N | N |
+- +------+----+----+----+----+
+- |poly | Y | Y | - | - |
+- +------+----+----+----+----+
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_f64 (float64x2_t __a, float64x2_t __b)
++{
++ return (uint64x2_t) (__a > __b);
++}
+
+- Create 3 element structures of:
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_s8 (int8x16_t __a, int8x16_t __b)
++{
++ return (uint8x16_t) (__a > __b);
++}
+
+- +------+----+----+----+----+
+- | | 8 | 16 | 32 | 64 |
+- +------+----+----+----+----+
+- |int | Y | Y | Y | Y |
+- +------+----+----+----+----+
+- |uint | Y | Y | Y | Y |
+- +------+----+----+----+----+
+- |float | - | Y | Y | Y |
+- +------+----+----+----+----+
+- |poly | Y | Y | - | - |
+- +------+----+----+----+----+
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_s16 (int16x8_t __a, int16x8_t __b)
++{
++ return (uint16x8_t) (__a > __b);
++}
+
+- Create 4 element structures of:
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_s32 (int32x4_t __a, int32x4_t __b)
++{
++ return (uint32x4_t) (__a > __b);
++}
+
+- +------+----+----+----+----+
+- | | 8 | 16 | 32 | 64 |
+- +------+----+----+----+----+
+- |int | Y | N | N | Y |
+- +------+----+----+----+----+
+- |uint | Y | N | N | Y |
+- +------+----+----+----+----+
+- |float | - | N | N | Y |
+- +------+----+----+----+----+
+- |poly | Y | N | - | - |
+- +------+----+----+----+----+
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_s64 (int64x2_t __a, int64x2_t __b)
++{
++ return (uint64x2_t) (__a > __b);
++}
+
+- This is required for casting memory reference. */
+-#define __STRUCTN(t, sz, nelem) \
+- typedef struct t ## sz ## x ## nelem ## _t { \
+- t ## sz ## _t val[nelem]; \
+- } t ## sz ## x ## nelem ## _t;
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_u8 (uint8x16_t __a, uint8x16_t __b)
++{
++ return (__a > __b);
++}
+
+-/* 2-element structs. */
+-__STRUCTN (int, 8, 2)
+-__STRUCTN (int, 16, 2)
+-__STRUCTN (uint, 8, 2)
+-__STRUCTN (uint, 16, 2)
+-__STRUCTN (float, 16, 2)
+-__STRUCTN (poly, 8, 2)
+-__STRUCTN (poly, 16, 2)
+-/* 3-element structs. */
+-__STRUCTN (int, 8, 3)
+-__STRUCTN (int, 16, 3)
+-__STRUCTN (int, 32, 3)
+-__STRUCTN (int, 64, 3)
+-__STRUCTN (uint, 8, 3)
+-__STRUCTN (uint, 16, 3)
+-__STRUCTN (uint, 32, 3)
+-__STRUCTN (uint, 64, 3)
+-__STRUCTN (float, 16, 3)
+-__STRUCTN (float, 32, 3)
+-__STRUCTN (float, 64, 3)
+-__STRUCTN (poly, 8, 3)
+-__STRUCTN (poly, 16, 3)
+-/* 4-element structs. */
+-__STRUCTN (int, 8, 4)
+-__STRUCTN (int, 64, 4)
+-__STRUCTN (uint, 8, 4)
+-__STRUCTN (uint, 64, 4)
+-__STRUCTN (poly, 8, 4)
+-__STRUCTN (float, 64, 4)
+-#undef __STRUCTN
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_u16 (uint16x8_t __a, uint16x8_t __b)
++{
++ return (__a > __b);
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_u32 (uint32x4_t __a, uint32x4_t __b)
++{
++ return (__a > __b);
++}
++
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_u64 (uint64x2_t __a, uint64x2_t __b)
++{
++ return (__a > __b);
++}
+
++/* vcgt - scalar. */
+
+-#define __ST2_LANE_FUNC(intype, largetype, ptrtype, mode, \
+- qmode, ptr_mode, funcsuffix, signedtype) \
+-__extension__ static __inline void \
+-__attribute__ ((__always_inline__)) \
+-vst2_lane_ ## funcsuffix (ptrtype *__ptr, \
+- intype __b, const int __c) \
+-{ \
+- __builtin_aarch64_simd_oi __o; \
+- largetype __temp; \
+- __temp.val[0] \
+- = vcombine_##funcsuffix (__b.val[0], \
+- vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+- __temp.val[1] \
+- = vcombine_##funcsuffix (__b.val[1], \
+- vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+- __o = __builtin_aarch64_set_qregoi##qmode (__o, \
+- (signedtype) __temp.val[0], 0); \
+- __o = __builtin_aarch64_set_qregoi##qmode (__o, \
+- (signedtype) __temp.val[1], 1); \
+- __builtin_aarch64_st2_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
+- __ptr, __o, __c); \
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgts_f32 (float32_t __a, float32_t __b)
++{
++ return __a > __b ? -1 : 0;
+ }
+
+-__ST2_LANE_FUNC (float16x4x2_t, float16x8x2_t, float16_t, v4hf, v8hf, hf, f16,
+- float16x8_t)
+-__ST2_LANE_FUNC (float32x2x2_t, float32x4x2_t, float32_t, v2sf, v4sf, sf, f32,
+- float32x4_t)
+-__ST2_LANE_FUNC (float64x1x2_t, float64x2x2_t, float64_t, df, v2df, df, f64,
+- float64x2_t)
+-__ST2_LANE_FUNC (poly8x8x2_t, poly8x16x2_t, poly8_t, v8qi, v16qi, qi, p8,
+- int8x16_t)
+-__ST2_LANE_FUNC (poly16x4x2_t, poly16x8x2_t, poly16_t, v4hi, v8hi, hi, p16,
+- int16x8_t)
+-__ST2_LANE_FUNC (int8x8x2_t, int8x16x2_t, int8_t, v8qi, v16qi, qi, s8,
+- int8x16_t)
+-__ST2_LANE_FUNC (int16x4x2_t, int16x8x2_t, int16_t, v4hi, v8hi, hi, s16,
+- int16x8_t)
+-__ST2_LANE_FUNC (int32x2x2_t, int32x4x2_t, int32_t, v2si, v4si, si, s32,
+- int32x4_t)
+-__ST2_LANE_FUNC (int64x1x2_t, int64x2x2_t, int64_t, di, v2di, di, s64,
+- int64x2_t)
+-__ST2_LANE_FUNC (uint8x8x2_t, uint8x16x2_t, uint8_t, v8qi, v16qi, qi, u8,
+- int8x16_t)
+-__ST2_LANE_FUNC (uint16x4x2_t, uint16x8x2_t, uint16_t, v4hi, v8hi, hi, u16,
+- int16x8_t)
+-__ST2_LANE_FUNC (uint32x2x2_t, uint32x4x2_t, uint32_t, v2si, v4si, si, u32,
+- int32x4_t)
+-__ST2_LANE_FUNC (uint64x1x2_t, uint64x2x2_t, uint64_t, di, v2di, di, u64,
+- int64x2_t)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtd_s64 (int64_t __a, int64_t __b)
++{
++ return __a > __b ? -1ll : 0ll;
++}
+
+-#undef __ST2_LANE_FUNC
+-#define __ST2_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \
+-__extension__ static __inline void \
+-__attribute__ ((__always_inline__)) \
+-vst2q_lane_ ## funcsuffix (ptrtype *__ptr, \
+- intype __b, const int __c) \
+-{ \
+- union { intype __i; \
+- __builtin_aarch64_simd_oi __o; } __temp = { __b }; \
+- __builtin_aarch64_st2_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
+- __ptr, __temp.__o, __c); \
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtd_u64 (uint64_t __a, uint64_t __b)
++{
++ return __a > __b ? -1ll : 0ll;
+ }
+
+-__ST2_LANE_FUNC (float16x8x2_t, float16_t, v8hf, hf, f16)
+-__ST2_LANE_FUNC (float32x4x2_t, float32_t, v4sf, sf, f32)
+-__ST2_LANE_FUNC (float64x2x2_t, float64_t, v2df, df, f64)
+-__ST2_LANE_FUNC (poly8x16x2_t, poly8_t, v16qi, qi, p8)
+-__ST2_LANE_FUNC (poly16x8x2_t, poly16_t, v8hi, hi, p16)
+-__ST2_LANE_FUNC (int8x16x2_t, int8_t, v16qi, qi, s8)
+-__ST2_LANE_FUNC (int16x8x2_t, int16_t, v8hi, hi, s16)
+-__ST2_LANE_FUNC (int32x4x2_t, int32_t, v4si, si, s32)
+-__ST2_LANE_FUNC (int64x2x2_t, int64_t, v2di, di, s64)
+-__ST2_LANE_FUNC (uint8x16x2_t, uint8_t, v16qi, qi, u8)
+-__ST2_LANE_FUNC (uint16x8x2_t, uint16_t, v8hi, hi, u16)
+-__ST2_LANE_FUNC (uint32x4x2_t, uint32_t, v4si, si, u32)
+-__ST2_LANE_FUNC (uint64x2x2_t, uint64_t, v2di, di, u64)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtd_f64 (float64_t __a, float64_t __b)
++{
++ return __a > __b ? -1ll : 0ll;
++}
+
+-#define __ST3_LANE_FUNC(intype, largetype, ptrtype, mode, \
+- qmode, ptr_mode, funcsuffix, signedtype) \
+-__extension__ static __inline void \
+-__attribute__ ((__always_inline__)) \
+-vst3_lane_ ## funcsuffix (ptrtype *__ptr, \
+- intype __b, const int __c) \
+-{ \
+- __builtin_aarch64_simd_ci __o; \
+- largetype __temp; \
+- __temp.val[0] \
+- = vcombine_##funcsuffix (__b.val[0], \
+- vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+- __temp.val[1] \
+- = vcombine_##funcsuffix (__b.val[1], \
+- vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+- __temp.val[2] \
+- = vcombine_##funcsuffix (__b.val[2], \
+- vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+- __o = __builtin_aarch64_set_qregci##qmode (__o, \
+- (signedtype) __temp.val[0], 0); \
+- __o = __builtin_aarch64_set_qregci##qmode (__o, \
+- (signedtype) __temp.val[1], 1); \
+- __o = __builtin_aarch64_set_qregci##qmode (__o, \
+- (signedtype) __temp.val[2], 2); \
+- __builtin_aarch64_st3_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
+- __ptr, __o, __c); \
++/* vcgtz - vector. */
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtz_f32 (float32x2_t __a)
++{
++ return (uint32x2_t) (__a > 0.0f);
+ }
+
+-__ST3_LANE_FUNC (float16x4x3_t, float16x8x3_t, float16_t, v4hf, v8hf, hf, f16,
+- float16x8_t)
+-__ST3_LANE_FUNC (float32x2x3_t, float32x4x3_t, float32_t, v2sf, v4sf, sf, f32,
+- float32x4_t)
+-__ST3_LANE_FUNC (float64x1x3_t, float64x2x3_t, float64_t, df, v2df, df, f64,
+- float64x2_t)
+-__ST3_LANE_FUNC (poly8x8x3_t, poly8x16x3_t, poly8_t, v8qi, v16qi, qi, p8,
+- int8x16_t)
+-__ST3_LANE_FUNC (poly16x4x3_t, poly16x8x3_t, poly16_t, v4hi, v8hi, hi, p16,
+- int16x8_t)
+-__ST3_LANE_FUNC (int8x8x3_t, int8x16x3_t, int8_t, v8qi, v16qi, qi, s8,
+- int8x16_t)
+-__ST3_LANE_FUNC (int16x4x3_t, int16x8x3_t, int16_t, v4hi, v8hi, hi, s16,
+- int16x8_t)
+-__ST3_LANE_FUNC (int32x2x3_t, int32x4x3_t, int32_t, v2si, v4si, si, s32,
+- int32x4_t)
+-__ST3_LANE_FUNC (int64x1x3_t, int64x2x3_t, int64_t, di, v2di, di, s64,
+- int64x2_t)
+-__ST3_LANE_FUNC (uint8x8x3_t, uint8x16x3_t, uint8_t, v8qi, v16qi, qi, u8,
+- int8x16_t)
+-__ST3_LANE_FUNC (uint16x4x3_t, uint16x8x3_t, uint16_t, v4hi, v8hi, hi, u16,
+- int16x8_t)
+-__ST3_LANE_FUNC (uint32x2x3_t, uint32x4x3_t, uint32_t, v2si, v4si, si, u32,
+- int32x4_t)
+-__ST3_LANE_FUNC (uint64x1x3_t, uint64x2x3_t, uint64_t, di, v2di, di, u64,
+- int64x2_t)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtz_f64 (float64x1_t __a)
++{
++ return (uint64x1_t) (__a > (float64x1_t) {0.0});
++}
+
+-#undef __ST3_LANE_FUNC
+-#define __ST3_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \
+-__extension__ static __inline void \
+-__attribute__ ((__always_inline__)) \
+-vst3q_lane_ ## funcsuffix (ptrtype *__ptr, \
+- intype __b, const int __c) \
+-{ \
+- union { intype __i; \
+- __builtin_aarch64_simd_ci __o; } __temp = { __b }; \
+- __builtin_aarch64_st3_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
+- __ptr, __temp.__o, __c); \
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtz_s8 (int8x8_t __a)
++{
++ return (uint8x8_t) (__a > 0);
+ }
+
+-__ST3_LANE_FUNC (float16x8x3_t, float16_t, v8hf, hf, f16)
+-__ST3_LANE_FUNC (float32x4x3_t, float32_t, v4sf, sf, f32)
+-__ST3_LANE_FUNC (float64x2x3_t, float64_t, v2df, df, f64)
+-__ST3_LANE_FUNC (poly8x16x3_t, poly8_t, v16qi, qi, p8)
+-__ST3_LANE_FUNC (poly16x8x3_t, poly16_t, v8hi, hi, p16)
+-__ST3_LANE_FUNC (int8x16x3_t, int8_t, v16qi, qi, s8)
+-__ST3_LANE_FUNC (int16x8x3_t, int16_t, v8hi, hi, s16)
+-__ST3_LANE_FUNC (int32x4x3_t, int32_t, v4si, si, s32)
+-__ST3_LANE_FUNC (int64x2x3_t, int64_t, v2di, di, s64)
+-__ST3_LANE_FUNC (uint8x16x3_t, uint8_t, v16qi, qi, u8)
+-__ST3_LANE_FUNC (uint16x8x3_t, uint16_t, v8hi, hi, u16)
+-__ST3_LANE_FUNC (uint32x4x3_t, uint32_t, v4si, si, u32)
+-__ST3_LANE_FUNC (uint64x2x3_t, uint64_t, v2di, di, u64)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtz_s16 (int16x4_t __a)
++{
++ return (uint16x4_t) (__a > 0);
++}
+
+-#define __ST4_LANE_FUNC(intype, largetype, ptrtype, mode, \
+- qmode, ptr_mode, funcsuffix, signedtype) \
+-__extension__ static __inline void \
+-__attribute__ ((__always_inline__)) \
+-vst4_lane_ ## funcsuffix (ptrtype *__ptr, \
+- intype __b, const int __c) \
+-{ \
+- __builtin_aarch64_simd_xi __o; \
+- largetype __temp; \
+- __temp.val[0] \
+- = vcombine_##funcsuffix (__b.val[0], \
+- vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+- __temp.val[1] \
+- = vcombine_##funcsuffix (__b.val[1], \
+- vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+- __temp.val[2] \
+- = vcombine_##funcsuffix (__b.val[2], \
+- vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+- __temp.val[3] \
+- = vcombine_##funcsuffix (__b.val[3], \
+- vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
+- __o = __builtin_aarch64_set_qregxi##qmode (__o, \
+- (signedtype) __temp.val[0], 0); \
+- __o = __builtin_aarch64_set_qregxi##qmode (__o, \
+- (signedtype) __temp.val[1], 1); \
+- __o = __builtin_aarch64_set_qregxi##qmode (__o, \
+- (signedtype) __temp.val[2], 2); \
+- __o = __builtin_aarch64_set_qregxi##qmode (__o, \
+- (signedtype) __temp.val[3], 3); \
+- __builtin_aarch64_st4_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
+- __ptr, __o, __c); \
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtz_s32 (int32x2_t __a)
++{
++ return (uint32x2_t) (__a > 0);
+ }
+
+-__ST4_LANE_FUNC (float16x4x4_t, float16x8x4_t, float16_t, v4hf, v8hf, hf, f16,
+- float16x8_t)
+-__ST4_LANE_FUNC (float32x2x4_t, float32x4x4_t, float32_t, v2sf, v4sf, sf, f32,
+- float32x4_t)
+-__ST4_LANE_FUNC (float64x1x4_t, float64x2x4_t, float64_t, df, v2df, df, f64,
+- float64x2_t)
+-__ST4_LANE_FUNC (poly8x8x4_t, poly8x16x4_t, poly8_t, v8qi, v16qi, qi, p8,
+- int8x16_t)
+-__ST4_LANE_FUNC (poly16x4x4_t, poly16x8x4_t, poly16_t, v4hi, v8hi, hi, p16,
+- int16x8_t)
+-__ST4_LANE_FUNC (int8x8x4_t, int8x16x4_t, int8_t, v8qi, v16qi, qi, s8,
+- int8x16_t)
+-__ST4_LANE_FUNC (int16x4x4_t, int16x8x4_t, int16_t, v4hi, v8hi, hi, s16,
+- int16x8_t)
+-__ST4_LANE_FUNC (int32x2x4_t, int32x4x4_t, int32_t, v2si, v4si, si, s32,
+- int32x4_t)
+-__ST4_LANE_FUNC (int64x1x4_t, int64x2x4_t, int64_t, di, v2di, di, s64,
+- int64x2_t)
+-__ST4_LANE_FUNC (uint8x8x4_t, uint8x16x4_t, uint8_t, v8qi, v16qi, qi, u8,
+- int8x16_t)
+-__ST4_LANE_FUNC (uint16x4x4_t, uint16x8x4_t, uint16_t, v4hi, v8hi, hi, u16,
+- int16x8_t)
+-__ST4_LANE_FUNC (uint32x2x4_t, uint32x4x4_t, uint32_t, v2si, v4si, si, u32,
+- int32x4_t)
+-__ST4_LANE_FUNC (uint64x1x4_t, uint64x2x4_t, uint64_t, di, v2di, di, u64,
+- int64x2_t)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtz_s64 (int64x1_t __a)
++{
++ return (uint64x1_t) (__a > __AARCH64_INT64_C (0));
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtzq_f32 (float32x4_t __a)
++{
++ return (uint32x4_t) (__a > 0.0f);
++}
++
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtzq_f64 (float64x2_t __a)
++{
++ return (uint64x2_t) (__a > 0.0);
++}
++
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtzq_s8 (int8x16_t __a)
++{
++ return (uint8x16_t) (__a > 0);
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtzq_s16 (int16x8_t __a)
++{
++ return (uint16x8_t) (__a > 0);
++}
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtzq_s32 (int32x4_t __a)
++{
++ return (uint32x4_t) (__a > 0);
++}
+
+-#undef __ST4_LANE_FUNC
+-#define __ST4_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \
+-__extension__ static __inline void \
+-__attribute__ ((__always_inline__)) \
+-vst4q_lane_ ## funcsuffix (ptrtype *__ptr, \
+- intype __b, const int __c) \
+-{ \
+- union { intype __i; \
+- __builtin_aarch64_simd_xi __o; } __temp = { __b }; \
+- __builtin_aarch64_st4_lane##mode ((__builtin_aarch64_simd_ ## ptr_mode *) \
+- __ptr, __temp.__o, __c); \
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtzq_s64 (int64x2_t __a)
++{
++ return (uint64x2_t) (__a > __AARCH64_INT64_C (0));
+ }
+
+-__ST4_LANE_FUNC (float16x8x4_t, float16_t, v8hf, hf, f16)
+-__ST4_LANE_FUNC (float32x4x4_t, float32_t, v4sf, sf, f32)
+-__ST4_LANE_FUNC (float64x2x4_t, float64_t, v2df, df, f64)
+-__ST4_LANE_FUNC (poly8x16x4_t, poly8_t, v16qi, qi, p8)
+-__ST4_LANE_FUNC (poly16x8x4_t, poly16_t, v8hi, hi, p16)
+-__ST4_LANE_FUNC (int8x16x4_t, int8_t, v16qi, qi, s8)
+-__ST4_LANE_FUNC (int16x8x4_t, int16_t, v8hi, hi, s16)
+-__ST4_LANE_FUNC (int32x4x4_t, int32_t, v4si, si, s32)
+-__ST4_LANE_FUNC (int64x2x4_t, int64_t, v2di, di, s64)
+-__ST4_LANE_FUNC (uint8x16x4_t, uint8_t, v16qi, qi, u8)
+-__ST4_LANE_FUNC (uint16x8x4_t, uint16_t, v8hi, hi, u16)
+-__ST4_LANE_FUNC (uint32x4x4_t, uint32_t, v4si, si, u32)
+-__ST4_LANE_FUNC (uint64x2x4_t, uint64_t, v2di, di, u64)
++/* vcgtz - scalar. */
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vaddlv_s32 (int32x2_t a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtzs_f32 (float32_t __a)
+ {
+- int64_t result;
+- __asm__ ("saddlp %0.1d, %1.2s" : "=w"(result) : "w"(a) : );
+- return result;
++ return __a > 0.0f ? -1 : 0;
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vaddlv_u32 (uint32x2_t a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtzd_s64 (int64_t __a)
+ {
+- uint64_t result;
+- __asm__ ("uaddlp %0.1d, %1.2s" : "=w"(result) : "w"(a) : );
+- return result;
++ return __a > 0 ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqdmulh_laneq_s16 (int16x4_t __a, int16x8_t __b, const int __c)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtzd_f64 (float64_t __a)
+ {
+- return __builtin_aarch64_sqdmulh_laneqv4hi (__a, __b, __c);
++ return __a > 0.0 ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqdmulh_laneq_s32 (int32x2_t __a, int32x4_t __b, const int __c)
++/* vcle - vector. */
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- return __builtin_aarch64_sqdmulh_laneqv2si (__a, __b, __c);
++ return (uint32x2_t) (__a <= __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqdmulhq_laneq_s16 (int16x8_t __a, int16x8_t __b, const int __c)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- return __builtin_aarch64_sqdmulh_laneqv8hi (__a, __b, __c);
++ return (uint64x1_t) (__a <= __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- return __builtin_aarch64_sqdmulh_laneqv4si (__a, __b, __c);
++ return (uint8x8_t) (__a <= __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrdmulh_laneq_s16 (int16x4_t __a, int16x8_t __b, const int __c)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- return __builtin_aarch64_sqrdmulh_laneqv4hi (__a, __b, __c);
++ return (uint16x4_t) (__a <= __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrdmulh_laneq_s32 (int32x2_t __a, int32x4_t __b, const int __c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- return __builtin_aarch64_sqrdmulh_laneqv2si (__a, __b, __c);
++ return (uint32x2_t) (__a <= __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqrdmulhq_laneq_s16 (int16x8_t __a, int16x8_t __b, const int __c)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_s64 (int64x1_t __a, int64x1_t __b)
+ {
+- return __builtin_aarch64_sqrdmulh_laneqv8hi (__a, __b, __c);
++ return (uint64x1_t) (__a <= __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqrdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- return __builtin_aarch64_sqrdmulh_laneqv4si (__a, __b, __c);
++ return (__a <= __b);
+ }
+
+-/* Table intrinsics. */
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_u16 (uint16x4_t __a, uint16x4_t __b)
++{
++ return (__a <= __b);
++}
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vqtbl1_p8 (poly8x16_t a, uint8x8_t b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- poly8x8_t result;
+- __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a <= __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqtbl1_s8 (int8x16_t a, uint8x8_t b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+- int8x8_t result;
+- __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (__a <= __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqtbl1_u8 (uint8x16_t a, uint8x8_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- uint8x8_t result;
+- __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint32x4_t) (__a <= __b);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vqtbl1q_p8 (poly8x16_t a, uint8x16_t b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- poly8x16_t result;
+- __asm__ ("tbl %0.16b, {%1.16b}, %2.16b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint64x2_t) (__a <= __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqtbl1q_s8 (int8x16_t a, uint8x16_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- int8x16_t result;
+- __asm__ ("tbl %0.16b, {%1.16b}, %2.16b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint8x16_t) (__a <= __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqtbl1q_u8 (uint8x16_t a, uint8x16_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- uint8x16_t result;
+- __asm__ ("tbl %0.16b, {%1.16b}, %2.16b"
+- : "=w"(result)
+- : "w"(a), "w"(b)
+- : /* No clobbers */);
+- return result;
++ return (uint16x8_t) (__a <= __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqtbx1_s8 (int8x8_t r, int8x16_t tab, uint8x8_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- int8x8_t result = r;
+- __asm__ ("tbx %0.8b,{%1.16b},%2.8b"
+- : "+w"(result)
+- : "w"(tab), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (uint32x4_t) (__a <= __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqtbx1_u8 (uint8x8_t r, uint8x16_t tab, uint8x8_t idx)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- uint8x8_t result = r;
+- __asm__ ("tbx %0.8b,{%1.16b},%2.8b"
+- : "+w"(result)
+- : "w"(tab), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (uint64x2_t) (__a <= __b);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vqtbx1_p8 (poly8x8_t r, poly8x16_t tab, uint8x8_t idx)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+- poly8x8_t result = r;
+- __asm__ ("tbx %0.8b,{%1.16b},%2.8b"
+- : "+w"(result)
+- : "w"(tab), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (__a <= __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqtbx1q_s8 (int8x16_t r, int8x16_t tab, uint8x16_t idx)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+- int8x16_t result = r;
+- __asm__ ("tbx %0.16b,{%1.16b},%2.16b"
+- : "+w"(result)
+- : "w"(tab), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (__a <= __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqtbx1q_u8 (uint8x16_t r, uint8x16_t tab, uint8x16_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+- uint8x16_t result = r;
+- __asm__ ("tbx %0.16b,{%1.16b},%2.16b"
+- : "+w"(result)
+- : "w"(tab), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (__a <= __b);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vqtbx1q_p8 (poly8x16_t r, poly8x16_t tab, uint8x16_t idx)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+- poly8x16_t result = r;
+- __asm__ ("tbx %0.16b,{%1.16b},%2.16b"
+- : "+w"(result)
+- : "w"(tab), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (__a <= __b);
+ }
+
+-/* V7 legacy table intrinsics. */
++/* vcle - scalar. */
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vtbl1_s8 (int8x8_t tab, int8x8_t idx)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcles_f32 (float32_t __a, float32_t __b)
+ {
+- int8x8_t result;
+- int8x16_t temp = vcombine_s8 (tab, vcreate_s8 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
+- : "=w"(result)
+- : "w"(temp), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return __a <= __b ? -1 : 0;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtbl1_u8 (uint8x8_t tab, uint8x8_t idx)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcled_s64 (int64_t __a, int64_t __b)
+ {
+- uint8x8_t result;
+- uint8x16_t temp = vcombine_u8 (tab, vcreate_u8 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
+- : "=w"(result)
+- : "w"(temp), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return __a <= __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vtbl1_p8 (poly8x8_t tab, uint8x8_t idx)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcled_u64 (uint64_t __a, uint64_t __b)
+ {
+- poly8x8_t result;
+- poly8x16_t temp = vcombine_p8 (tab, vcreate_p8 (__AARCH64_UINT64_C (0x0)));
+- __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
+- : "=w"(result)
+- : "w"(temp), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return __a <= __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vtbl2_s8 (int8x8x2_t tab, int8x8_t idx)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcled_f64 (float64_t __a, float64_t __b)
+ {
+- int8x8_t result;
+- int8x16_t temp = vcombine_s8 (tab.val[0], tab.val[1]);
+- __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
+- : "=w"(result)
+- : "w"(temp), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return __a <= __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtbl2_u8 (uint8x8x2_t tab, uint8x8_t idx)
++/* vclez - vector. */
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclez_f32 (float32x2_t __a)
+ {
+- uint8x8_t result;
+- uint8x16_t temp = vcombine_u8 (tab.val[0], tab.val[1]);
+- __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
+- : "=w"(result)
+- : "w"(temp), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (uint32x2_t) (__a <= 0.0f);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vtbl2_p8 (poly8x8x2_t tab, uint8x8_t idx)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclez_f64 (float64x1_t __a)
+ {
+- poly8x8_t result;
+- poly8x16_t temp = vcombine_p8 (tab.val[0], tab.val[1]);
+- __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
+- : "=w"(result)
+- : "w"(temp), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (uint64x1_t) (__a <= (float64x1_t) {0.0});
++}
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclez_s8 (int8x8_t __a)
++{
++ return (uint8x8_t) (__a <= 0);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vtbl3_s8 (int8x8x3_t tab, int8x8_t idx)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclez_s16 (int16x4_t __a)
+ {
+- int8x8_t result;
+- int8x16x2_t temp;
+- __builtin_aarch64_simd_oi __o;
+- temp.val[0] = vcombine_s8 (tab.val[0], tab.val[1]);
+- temp.val[1] = vcombine_s8 (tab.val[2], vcreate_s8 (__AARCH64_UINT64_C (0x0)));
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[1], 1);
+- result = __builtin_aarch64_tbl3v8qi (__o, idx);
+- return result;
++ return (uint16x4_t) (__a <= 0);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtbl3_u8 (uint8x8x3_t tab, uint8x8_t idx)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclez_s32 (int32x2_t __a)
+ {
+- uint8x8_t result;
+- uint8x16x2_t temp;
+- __builtin_aarch64_simd_oi __o;
+- temp.val[0] = vcombine_u8 (tab.val[0], tab.val[1]);
+- temp.val[1] = vcombine_u8 (tab.val[2], vcreate_u8 (__AARCH64_UINT64_C (0x0)));
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[1], 1);
+- result = (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
+- return result;
++ return (uint32x2_t) (__a <= 0);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vtbl3_p8 (poly8x8x3_t tab, uint8x8_t idx)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclez_s64 (int64x1_t __a)
+ {
+- poly8x8_t result;
+- poly8x16x2_t temp;
+- __builtin_aarch64_simd_oi __o;
+- temp.val[0] = vcombine_p8 (tab.val[0], tab.val[1]);
+- temp.val[1] = vcombine_p8 (tab.val[2], vcreate_p8 (__AARCH64_UINT64_C (0x0)));
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[1], 1);
+- result = (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
+- return result;
++ return (uint64x1_t) (__a <= __AARCH64_INT64_C (0));
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vtbl4_s8 (int8x8x4_t tab, int8x8_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclezq_f32 (float32x4_t __a)
+ {
+- int8x8_t result;
+- int8x16x2_t temp;
+- __builtin_aarch64_simd_oi __o;
+- temp.val[0] = vcombine_s8 (tab.val[0], tab.val[1]);
+- temp.val[1] = vcombine_s8 (tab.val[2], tab.val[3]);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[1], 1);
+- result = __builtin_aarch64_tbl3v8qi (__o, idx);
+- return result;
++ return (uint32x4_t) (__a <= 0.0f);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtbl4_u8 (uint8x8x4_t tab, uint8x8_t idx)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclezq_f64 (float64x2_t __a)
+ {
+- uint8x8_t result;
+- uint8x16x2_t temp;
+- __builtin_aarch64_simd_oi __o;
+- temp.val[0] = vcombine_u8 (tab.val[0], tab.val[1]);
+- temp.val[1] = vcombine_u8 (tab.val[2], tab.val[3]);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[1], 1);
+- result = (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
+- return result;
++ return (uint64x2_t) (__a <= 0.0);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vtbl4_p8 (poly8x8x4_t tab, uint8x8_t idx)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclezq_s8 (int8x16_t __a)
+ {
+- poly8x8_t result;
+- poly8x16x2_t temp;
+- __builtin_aarch64_simd_oi __o;
+- temp.val[0] = vcombine_p8 (tab.val[0], tab.val[1]);
+- temp.val[1] = vcombine_p8 (tab.val[2], tab.val[3]);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[1], 1);
+- result = (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
+- return result;
++ return (uint8x16_t) (__a <= 0);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vtbx2_s8 (int8x8_t r, int8x8x2_t tab, int8x8_t idx)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclezq_s16 (int16x8_t __a)
+ {
+- int8x8_t result = r;
+- int8x16_t temp = vcombine_s8 (tab.val[0], tab.val[1]);
+- __asm__ ("tbx %0.8b, {%1.16b}, %2.8b"
+- : "+w"(result)
+- : "w"(temp), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (uint16x8_t) (__a <= 0);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtbx2_u8 (uint8x8_t r, uint8x8x2_t tab, uint8x8_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclezq_s32 (int32x4_t __a)
+ {
+- uint8x8_t result = r;
+- uint8x16_t temp = vcombine_u8 (tab.val[0], tab.val[1]);
+- __asm__ ("tbx %0.8b, {%1.16b}, %2.8b"
+- : "+w"(result)
+- : "w"(temp), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (uint32x4_t) (__a <= 0);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vtbx2_p8 (poly8x8_t r, poly8x8x2_t tab, uint8x8_t idx)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclezq_s64 (int64x2_t __a)
+ {
+- poly8x8_t result = r;
+- poly8x16_t temp = vcombine_p8 (tab.val[0], tab.val[1]);
+- __asm__ ("tbx %0.8b, {%1.16b}, %2.8b"
+- : "+w"(result)
+- : "w"(temp), "w"(idx)
+- : /* No clobbers */);
+- return result;
++ return (uint64x2_t) (__a <= __AARCH64_INT64_C (0));
+ }
+
+-/* End of temporary inline asm. */
+-
+-/* Start of optimal implementations in approved order. */
+-
+-/* vabs */
++/* vclez - scalar. */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vabs_f32 (float32x2_t __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclezs_f32 (float32_t __a)
+ {
+- return __builtin_aarch64_absv2sf (__a);
++ return __a <= 0.0f ? -1 : 0;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vabs_f64 (float64x1_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclezd_s64 (int64_t __a)
+ {
+- return (float64x1_t) {__builtin_fabs (__a[0])};
++ return __a <= 0 ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vabs_s8 (int8x8_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclezd_f64 (float64_t __a)
+ {
+- return __builtin_aarch64_absv8qi (__a);
++ return __a <= 0.0 ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vabs_s16 (int16x4_t __a)
++/* vclt - vector. */
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- return __builtin_aarch64_absv4hi (__a);
++ return (uint32x2_t) (__a < __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vabs_s32 (int32x2_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- return __builtin_aarch64_absv2si (__a);
++ return (uint64x1_t) (__a < __b);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vabs_s64 (int64x1_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- return (int64x1_t) {__builtin_aarch64_absdi (__a[0])};
++ return (uint8x8_t) (__a < __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vabsq_f32 (float32x4_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- return __builtin_aarch64_absv4sf (__a);
++ return (uint16x4_t) (__a < __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vabsq_f64 (float64x2_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- return __builtin_aarch64_absv2df (__a);
++ return (uint32x2_t) (__a < __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vabsq_s8 (int8x16_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_s64 (int64x1_t __a, int64x1_t __b)
+ {
+- return __builtin_aarch64_absv16qi (__a);
++ return (uint64x1_t) (__a < __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vabsq_s16 (int16x8_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- return __builtin_aarch64_absv8hi (__a);
++ return (__a < __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vabsq_s32 (int32x4_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+- return __builtin_aarch64_absv4si (__a);
++ return (__a < __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vabsq_s64 (int64x2_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- return __builtin_aarch64_absv2di (__a);
++ return (__a < __b);
+ }
+
+-/* vadd */
+-
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vaddd_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+- return __a + __b;
++ return (__a < __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vaddd_u64 (uint64_t __a, uint64_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- return __a + __b;
++ return (uint32x4_t) (__a < __b);
+ }
+
+-/* vaddv */
+-
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vaddv_s8 (int8x8_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- return __builtin_aarch64_reduc_plus_scal_v8qi (__a);
++ return (uint64x2_t) (__a < __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vaddv_s16 (int16x4_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- return __builtin_aarch64_reduc_plus_scal_v4hi (__a);
++ return (uint8x16_t) (__a < __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vaddv_s32 (int32x2_t __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- return __builtin_aarch64_reduc_plus_scal_v2si (__a);
++ return (uint16x8_t) (__a < __b);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vaddv_u8 (uint8x8_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- return (uint8_t) __builtin_aarch64_reduc_plus_scal_v8qi ((int8x8_t) __a);
++ return (uint32x4_t) (__a < __b);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vaddv_u16 (uint16x4_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- return (uint16_t) __builtin_aarch64_reduc_plus_scal_v4hi ((int16x4_t) __a);
++ return (uint64x2_t) (__a < __b);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vaddv_u32 (uint32x2_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+- return (int32_t) __builtin_aarch64_reduc_plus_scal_v2si ((int32x2_t) __a);
++ return (__a < __b);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vaddvq_s8 (int8x16_t __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+- return __builtin_aarch64_reduc_plus_scal_v16qi (__a);
++ return (__a < __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vaddvq_s16 (int16x8_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+- return __builtin_aarch64_reduc_plus_scal_v8hi (__a);
++ return (__a < __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vaddvq_s32 (int32x4_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+- return __builtin_aarch64_reduc_plus_scal_v4si (__a);
++ return (__a < __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vaddvq_s64 (int64x2_t __a)
++/* vclt - scalar. */
++
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclts_f32 (float32_t __a, float32_t __b)
+ {
+- return __builtin_aarch64_reduc_plus_scal_v2di (__a);
++ return __a < __b ? -1 : 0;
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vaddvq_u8 (uint8x16_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltd_s64 (int64_t __a, int64_t __b)
+ {
+- return (uint8_t) __builtin_aarch64_reduc_plus_scal_v16qi ((int8x16_t) __a);
++ return __a < __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vaddvq_u16 (uint16x8_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltd_u64 (uint64_t __a, uint64_t __b)
+ {
+- return (uint16_t) __builtin_aarch64_reduc_plus_scal_v8hi ((int16x8_t) __a);
++ return __a < __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vaddvq_u32 (uint32x4_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltd_f64 (float64_t __a, float64_t __b)
+ {
+- return (uint32_t) __builtin_aarch64_reduc_plus_scal_v4si ((int32x4_t) __a);
++ return __a < __b ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vaddvq_u64 (uint64x2_t __a)
++/* vcltz - vector. */
++
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltz_f32 (float32x2_t __a)
+ {
+- return (uint64_t) __builtin_aarch64_reduc_plus_scal_v2di ((int64x2_t) __a);
++ return (uint32x2_t) (__a < 0.0f);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vaddv_f32 (float32x2_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltz_f64 (float64x1_t __a)
+ {
+- return __builtin_aarch64_reduc_plus_scal_v2sf (__a);
++ return (uint64x1_t) (__a < (float64x1_t) {0.0});
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vaddvq_f32 (float32x4_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltz_s8 (int8x8_t __a)
+ {
+- return __builtin_aarch64_reduc_plus_scal_v4sf (__a);
++ return (uint8x8_t) (__a < 0);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vaddvq_f64 (float64x2_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltz_s16 (int16x4_t __a)
+ {
+- return __builtin_aarch64_reduc_plus_scal_v2df (__a);
++ return (uint16x4_t) (__a < 0);
+ }
+
+-/* vbsl */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vbsl_f32 (uint32x2_t __a, float32x2_t __b, float32x2_t __c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltz_s32 (int32x2_t __a)
+ {
+- return __builtin_aarch64_simd_bslv2sf_suss (__a, __b, __c);
++ return (uint32x2_t) (__a < 0);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vbsl_f64 (uint64x1_t __a, float64x1_t __b, float64x1_t __c)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltz_s64 (int64x1_t __a)
+ {
+- return (float64x1_t)
+- { __builtin_aarch64_simd_bsldf_suss (__a[0], __b[0], __c[0]) };
++ return (uint64x1_t) (__a < __AARCH64_INT64_C (0));
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vbsl_p8 (uint8x8_t __a, poly8x8_t __b, poly8x8_t __c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltzq_f32 (float32x4_t __a)
+ {
+- return __builtin_aarch64_simd_bslv8qi_pupp (__a, __b, __c);
++ return (uint32x4_t) (__a < 0.0f);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vbsl_p16 (uint16x4_t __a, poly16x4_t __b, poly16x4_t __c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltzq_f64 (float64x2_t __a)
+ {
+- return __builtin_aarch64_simd_bslv4hi_pupp (__a, __b, __c);
++ return (uint64x2_t) (__a < 0.0);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vbsl_s8 (uint8x8_t __a, int8x8_t __b, int8x8_t __c)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltzq_s8 (int8x16_t __a)
+ {
+- return __builtin_aarch64_simd_bslv8qi_suss (__a, __b, __c);
++ return (uint8x16_t) (__a < 0);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vbsl_s16 (uint16x4_t __a, int16x4_t __b, int16x4_t __c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltzq_s16 (int16x8_t __a)
+ {
+- return __builtin_aarch64_simd_bslv4hi_suss (__a, __b, __c);
++ return (uint16x8_t) (__a < 0);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vbsl_s32 (uint32x2_t __a, int32x2_t __b, int32x2_t __c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltzq_s32 (int32x4_t __a)
+ {
+- return __builtin_aarch64_simd_bslv2si_suss (__a, __b, __c);
++ return (uint32x4_t) (__a < 0);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vbsl_s64 (uint64x1_t __a, int64x1_t __b, int64x1_t __c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltzq_s64 (int64x2_t __a)
+ {
+- return (int64x1_t)
+- {__builtin_aarch64_simd_bsldi_suss (__a[0], __b[0], __c[0])};
++ return (uint64x2_t) (__a < __AARCH64_INT64_C (0));
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vbsl_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c)
++/* vcltz - scalar. */
++
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltzs_f32 (float32_t __a)
+ {
+- return __builtin_aarch64_simd_bslv8qi_uuuu (__a, __b, __c);
++ return __a < 0.0f ? -1 : 0;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vbsl_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltzd_s64 (int64_t __a)
+ {
+- return __builtin_aarch64_simd_bslv4hi_uuuu (__a, __b, __c);
++ return __a < 0 ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vbsl_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltzd_f64 (float64_t __a)
+ {
+- return __builtin_aarch64_simd_bslv2si_uuuu (__a, __b, __c);
++ return __a < 0.0 ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vbsl_u64 (uint64x1_t __a, uint64x1_t __b, uint64x1_t __c)
++/* vcls. */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcls_s8 (int8x8_t __a)
+ {
+- return (uint64x1_t)
+- {__builtin_aarch64_simd_bsldi_uuuu (__a[0], __b[0], __c[0])};
++ return __builtin_aarch64_clrsbv8qi (__a);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vbslq_f32 (uint32x4_t __a, float32x4_t __b, float32x4_t __c)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcls_s16 (int16x4_t __a)
+ {
+- return __builtin_aarch64_simd_bslv4sf_suss (__a, __b, __c);
++ return __builtin_aarch64_clrsbv4hi (__a);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vbslq_f64 (uint64x2_t __a, float64x2_t __b, float64x2_t __c)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcls_s32 (int32x2_t __a)
+ {
+- return __builtin_aarch64_simd_bslv2df_suss (__a, __b, __c);
++ return __builtin_aarch64_clrsbv2si (__a);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vbslq_p8 (uint8x16_t __a, poly8x16_t __b, poly8x16_t __c)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclsq_s8 (int8x16_t __a)
+ {
+- return __builtin_aarch64_simd_bslv16qi_pupp (__a, __b, __c);
++ return __builtin_aarch64_clrsbv16qi (__a);
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vbslq_p16 (uint16x8_t __a, poly16x8_t __b, poly16x8_t __c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclsq_s16 (int16x8_t __a)
+ {
+- return __builtin_aarch64_simd_bslv8hi_pupp (__a, __b, __c);
++ return __builtin_aarch64_clrsbv8hi (__a);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vbslq_s8 (uint8x16_t __a, int8x16_t __b, int8x16_t __c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclsq_s32 (int32x4_t __a)
+ {
+- return __builtin_aarch64_simd_bslv16qi_suss (__a, __b, __c);
++ return __builtin_aarch64_clrsbv4si (__a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vbslq_s16 (uint16x8_t __a, int16x8_t __b, int16x8_t __c)
++/* vclz. */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclz_s8 (int8x8_t __a)
+ {
+- return __builtin_aarch64_simd_bslv8hi_suss (__a, __b, __c);
++ return __builtin_aarch64_clzv8qi (__a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vbslq_s32 (uint32x4_t __a, int32x4_t __b, int32x4_t __c)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclz_s16 (int16x4_t __a)
+ {
+- return __builtin_aarch64_simd_bslv4si_suss (__a, __b, __c);
++ return __builtin_aarch64_clzv4hi (__a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vbslq_s64 (uint64x2_t __a, int64x2_t __b, int64x2_t __c)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclz_s32 (int32x2_t __a)
+ {
+- return __builtin_aarch64_simd_bslv2di_suss (__a, __b, __c);
++ return __builtin_aarch64_clzv2si (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vbslq_u8 (uint8x16_t __a, uint8x16_t __b, uint8x16_t __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclz_u8 (uint8x8_t __a)
+ {
+- return __builtin_aarch64_simd_bslv16qi_uuuu (__a, __b, __c);
++ return (uint8x8_t)__builtin_aarch64_clzv8qi ((int8x8_t)__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vbslq_u16 (uint16x8_t __a, uint16x8_t __b, uint16x8_t __c)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclz_u16 (uint16x4_t __a)
+ {
+- return __builtin_aarch64_simd_bslv8hi_uuuu (__a, __b, __c);
++ return (uint16x4_t)__builtin_aarch64_clzv4hi ((int16x4_t)__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vbslq_u32 (uint32x4_t __a, uint32x4_t __b, uint32x4_t __c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclz_u32 (uint32x2_t __a)
+ {
+- return __builtin_aarch64_simd_bslv4si_uuuu (__a, __b, __c);
++ return (uint32x2_t)__builtin_aarch64_clzv2si ((int32x2_t)__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vbslq_u64 (uint64x2_t __a, uint64x2_t __b, uint64x2_t __c)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclzq_s8 (int8x16_t __a)
+ {
+- return __builtin_aarch64_simd_bslv2di_uuuu (__a, __b, __c);
++ return __builtin_aarch64_clzv16qi (__a);
+ }
+
+-/* ARMv8.1 instrinsics. */
+-#pragma GCC push_options
+-#pragma GCC target ("arch=armv8.1-a")
+-
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclzq_s16 (int16x8_t __a)
+ {
+- return __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);
++ return __builtin_aarch64_clzv8hi (__a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclzq_s32 (int32x4_t __a)
+ {
+- return __builtin_aarch64_sqrdmlahv2si (__a, __b, __c);
++ return __builtin_aarch64_clzv4si (__a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclzq_u8 (uint8x16_t __a)
+ {
+- return __builtin_aarch64_sqrdmlahv8hi (__a, __b, __c);
++ return (uint8x16_t)__builtin_aarch64_clzv16qi ((int8x16_t)__a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclzq_u16 (uint16x8_t __a)
+ {
+- return __builtin_aarch64_sqrdmlahv4si (__a, __b, __c);
++ return (uint16x8_t)__builtin_aarch64_clzv8hi ((int16x8_t)__a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclzq_u32 (uint32x4_t __a)
+ {
+- return __builtin_aarch64_sqrdmlshv4hi (__a, __b, __c);
++ return (uint32x4_t)__builtin_aarch64_clzv4si ((int32x4_t)__a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
++/* vcnt. */
++
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcnt_p8 (poly8x8_t __a)
+ {
+- return __builtin_aarch64_sqrdmlshv2si (__a, __b, __c);
++ return (poly8x8_t) __builtin_aarch64_popcountv8qi ((int8x8_t) __a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcnt_s8 (int8x8_t __a)
+ {
+- return __builtin_aarch64_sqrdmlshv8hi (__a, __b, __c);
++ return __builtin_aarch64_popcountv8qi (__a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcnt_u8 (uint8x8_t __a)
+ {
+- return __builtin_aarch64_sqrdmlshv4si (__a, __b, __c);
++ return (uint8x8_t) __builtin_aarch64_popcountv8qi ((int8x8_t) __a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrdmlah_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcntq_p8 (poly8x16_t __a)
+ {
+- return __builtin_aarch64_sqrdmlah_laneqv4hi (__a, __b, __c, __d);
++ return (poly8x16_t) __builtin_aarch64_popcountv16qi ((int8x16_t) __a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrdmlah_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcntq_s8 (int8x16_t __a)
+ {
+- return __builtin_aarch64_sqrdmlah_laneqv2si (__a, __b, __c, __d);
++ return __builtin_aarch64_popcountv16qi (__a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqrdmlahq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcntq_u8 (uint8x16_t __a)
+ {
+- return __builtin_aarch64_sqrdmlah_laneqv8hi (__a, __b, __c, __d);
++ return (uint8x16_t) __builtin_aarch64_popcountv16qi ((int8x16_t) __a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqrdmlahq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
++/* vcopy_lane. */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_f32 (float32x2_t __a, const int __lane1,
++ float32x2_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlah_laneqv4si (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrdmlsh_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_f64 (float64x1_t __a, const int __lane1,
++ float64x1_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_laneqv4hi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrdmlsh_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_p8 (poly8x8_t __a, const int __lane1,
++ poly8x8_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_laneqv2si (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqrdmlshq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_p16 (poly16x4_t __a, const int __lane1,
++ poly16x4_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_laneqv8hi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqrdmlshq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_s8 (int8x8_t __a, const int __lane1,
++ int8x8_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_laneqv4si (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrdmlah_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_s16 (int16x4_t __a, const int __lane1,
++ int16x4_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlah_lanev4hi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrdmlah_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_s32 (int32x2_t __a, const int __lane1,
++ int32x2_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlah_lanev2si (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqrdmlahq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_s64 (int64x1_t __a, const int __lane1,
++ int64x1_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlah_lanev8hi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqrdmlahq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_u8 (uint8x8_t __a, const int __lane1,
++ uint8x8_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlah_lanev4si (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrdmlahh_s16 (int16_t __a, int16_t __b, int16_t __c)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_u16 (uint16x4_t __a, const int __lane1,
++ uint16x4_t __b, const int __lane2)
+ {
+- return (int16_t) __builtin_aarch64_sqrdmlahhi (__a, __b, __c);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrdmlahh_lane_s16 (int16_t __a, int16_t __b, int16x4_t __c, const int __d)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_u32 (uint32x2_t __a, const int __lane1,
++ uint32x2_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlah_lanehi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrdmlahh_laneq_s16 (int16_t __a, int16_t __b, int16x8_t __c, const int __d)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_lane_u64 (uint64x1_t __a, const int __lane1,
++ uint64x1_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlah_laneqhi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrdmlahs_s32 (int32_t __a, int32_t __b, int32_t __c)
++/* vcopy_laneq. */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_f32 (float32x2_t __a, const int __lane1,
++ float32x4_t __b, const int __lane2)
+ {
+- return (int32_t) __builtin_aarch64_sqrdmlahsi (__a, __b, __c);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrdmlahs_lane_s32 (int32_t __a, int32_t __b, int32x2_t __c, const int __d)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_f64 (float64x1_t __a, const int __lane1,
++ float64x2_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlah_lanesi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrdmlahs_laneq_s32 (int32_t __a, int32_t __b, int32x4_t __c, const int __d)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_p8 (poly8x8_t __a, const int __lane1,
++ poly8x16_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlah_laneqsi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrdmlsh_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_p16 (poly16x4_t __a, const int __lane1,
++ poly16x8_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_lanev4hi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrdmlsh_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_s8 (int8x8_t __a, const int __lane1,
++ int8x16_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_lanev2si (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqrdmlshq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_s16 (int16x4_t __a, const int __lane1,
++ int16x8_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_lanev8hi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqrdmlshq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_s32 (int32x2_t __a, const int __lane1,
++ int32x4_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_lanev4si (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrdmlshh_s16 (int16_t __a, int16_t __b, int16_t __c)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_s64 (int64x1_t __a, const int __lane1,
++ int64x2_t __b, const int __lane2)
+ {
+- return (int16_t) __builtin_aarch64_sqrdmlshhi (__a, __b, __c);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrdmlshh_lane_s16 (int16_t __a, int16_t __b, int16x4_t __c, const int __d)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_u8 (uint8x8_t __a, const int __lane1,
++ uint8x16_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_lanehi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrdmlshh_laneq_s16 (int16_t __a, int16_t __b, int16x8_t __c, const int __d)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_u16 (uint16x4_t __a, const int __lane1,
++ uint16x8_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_laneqhi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrdmlshs_s32 (int32_t __a, int32_t __b, int32_t __c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_u32 (uint32x2_t __a, const int __lane1,
++ uint32x4_t __b, const int __lane2)
+ {
+- return (int32_t) __builtin_aarch64_sqrdmlshsi (__a, __b, __c);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrdmlshs_lane_s32 (int32_t __a, int32_t __b, int32x2_t __c, const int __d)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopy_laneq_u64 (uint64x1_t __a, const int __lane1,
++ uint64x2_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_lanesi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrdmlshs_laneq_s32 (int32_t __a, int32_t __b, int32x4_t __c, const int __d)
++/* vcopyq_lane. */
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_f32 (float32x4_t __a, const int __lane1,
++ float32x2_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_sqrdmlsh_laneqsi (__a, __b, __c, __d);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+-#pragma GCC pop_options
+
+-#pragma GCC push_options
+-#pragma GCC target ("+nothing+crypto")
+-/* vaes */
+-
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vaeseq_u8 (uint8x16_t data, uint8x16_t key)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_f64 (float64x2_t __a, const int __lane1,
++ float64x1_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_crypto_aesev16qi_uuu (data, key);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vaesdq_u8 (uint8x16_t data, uint8x16_t key)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_p8 (poly8x16_t __a, const int __lane1,
++ poly8x8_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_crypto_aesdv16qi_uuu (data, key);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vaesmcq_u8 (uint8x16_t data)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_p16 (poly16x8_t __a, const int __lane1,
++ poly16x4_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_crypto_aesmcv16qi_uu (data);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vaesimcq_u8 (uint8x16_t data)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_s8 (int8x16_t __a, const int __lane1,
++ int8x8_t __b, const int __lane2)
+ {
+- return __builtin_aarch64_crypto_aesimcv16qi_uu (data);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+-#pragma GCC pop_options
+
+-/* vcage */
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_s16 (int16x8_t __a, const int __lane1,
++ int16x4_t __b, const int __lane2)
++{
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
++}
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcage_f64 (float64x1_t __a, float64x1_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_s32 (int32x4_t __a, const int __lane1,
++ int32x2_t __b, const int __lane2)
+ {
+- return vabs_f64 (__a) >= vabs_f64 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcages_f32 (float32_t __a, float32_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_s64 (int64x2_t __a, const int __lane1,
++ int64x1_t __b, const int __lane2)
+ {
+- return __builtin_fabsf (__a) >= __builtin_fabsf (__b) ? -1 : 0;
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcage_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_u8 (uint8x16_t __a, const int __lane1,
++ uint8x8_t __b, const int __lane2)
+ {
+- return vabs_f32 (__a) >= vabs_f32 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcageq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_u16 (uint16x8_t __a, const int __lane1,
++ uint16x4_t __b, const int __lane2)
+ {
+- return vabsq_f32 (__a) >= vabsq_f32 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcaged_f64 (float64_t __a, float64_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_u32 (uint32x4_t __a, const int __lane1,
++ uint32x2_t __b, const int __lane2)
+ {
+- return __builtin_fabs (__a) >= __builtin_fabs (__b) ? -1 : 0;
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcageq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_lane_u64 (uint64x2_t __a, const int __lane1,
++ uint64x1_t __b, const int __lane2)
+ {
+- return vabsq_f64 (__a) >= vabsq_f64 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-/* vcagt */
++/* vcopyq_laneq. */
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcagts_f32 (float32_t __a, float32_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_f32 (float32x4_t __a, const int __lane1,
++ float32x4_t __b, const int __lane2)
+ {
+- return __builtin_fabsf (__a) > __builtin_fabsf (__b) ? -1 : 0;
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcagt_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_f64 (float64x2_t __a, const int __lane1,
++ float64x2_t __b, const int __lane2)
+ {
+- return vabs_f32 (__a) > vabs_f32 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcagt_f64 (float64x1_t __a, float64x1_t __b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_p8 (poly8x16_t __a, const int __lane1,
++ poly8x16_t __b, const int __lane2)
+ {
+- return vabs_f64 (__a) > vabs_f64 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcagtq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_p16 (poly16x8_t __a, const int __lane1,
++ poly16x8_t __b, const int __lane2)
+ {
+- return vabsq_f32 (__a) > vabsq_f32 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcagtd_f64 (float64_t __a, float64_t __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_s8 (int8x16_t __a, const int __lane1,
++ int8x16_t __b, const int __lane2)
+ {
+- return __builtin_fabs (__a) > __builtin_fabs (__b) ? -1 : 0;
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcagtq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_s16 (int16x8_t __a, const int __lane1,
++ int16x8_t __b, const int __lane2)
+ {
+- return vabsq_f64 (__a) > vabsq_f64 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-/* vcale */
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_s32 (int32x4_t __a, const int __lane1,
++ int32x4_t __b, const int __lane2)
++{
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
++}
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcale_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_s64 (int64x2_t __a, const int __lane1,
++ int64x2_t __b, const int __lane2)
+ {
+- return vabs_f32 (__a) <= vabs_f32 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcale_f64 (float64x1_t __a, float64x1_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_u8 (uint8x16_t __a, const int __lane1,
++ uint8x16_t __b, const int __lane2)
+ {
+- return vabs_f64 (__a) <= vabs_f64 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcaled_f64 (float64_t __a, float64_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_u16 (uint16x8_t __a, const int __lane1,
++ uint16x8_t __b, const int __lane2)
+ {
+- return __builtin_fabs (__a) <= __builtin_fabs (__b) ? -1 : 0;
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcales_f32 (float32_t __a, float32_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_u32 (uint32x4_t __a, const int __lane1,
++ uint32x4_t __b, const int __lane2)
+ {
+- return __builtin_fabsf (__a) <= __builtin_fabsf (__b) ? -1 : 0;
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcaleq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcopyq_laneq_u64 (uint64x2_t __a, const int __lane1,
++ uint64x2_t __b, const int __lane2)
+ {
+- return vabsq_f32 (__a) <= vabsq_f32 (__b);
++ return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
++ __a, __lane1);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcaleq_f64 (float64x2_t __a, float64x2_t __b)
++/* vcvt (double -> float). */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_f16_f32 (float32x4_t __a)
+ {
+- return vabsq_f64 (__a) <= vabsq_f64 (__b);
++ return __builtin_aarch64_float_truncate_lo_v4hf (__a);
+ }
+
+-/* vcalt */
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_high_f16_f32 (float16x4_t __a, float32x4_t __b)
++{
++ return __builtin_aarch64_float_truncate_hi_v8hf (__a, __b);
++}
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcalt_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_f32_f64 (float64x2_t __a)
+ {
+- return vabs_f32 (__a) < vabs_f32 (__b);
++ return __builtin_aarch64_float_truncate_lo_v2sf (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcalt_f64 (float64x1_t __a, float64x1_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_high_f32_f64 (float32x2_t __a, float64x2_t __b)
+ {
+- return vabs_f64 (__a) < vabs_f64 (__b);
++ return __builtin_aarch64_float_truncate_hi_v4sf (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcaltd_f64 (float64_t __a, float64_t __b)
++/* vcvt (float -> double). */
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_f32_f16 (float16x4_t __a)
+ {
+- return __builtin_fabs (__a) < __builtin_fabs (__b) ? -1 : 0;
++ return __builtin_aarch64_float_extend_lo_v4sf (__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcaltq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_f64_f32 (float32x2_t __a)
+ {
+- return vabsq_f32 (__a) < vabsq_f32 (__b);
++
++ return __builtin_aarch64_float_extend_lo_v2df (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcaltq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_high_f32_f16 (float16x8_t __a)
+ {
+- return vabsq_f64 (__a) < vabsq_f64 (__b);
++ return __builtin_aarch64_vec_unpacks_hi_v8hf (__a);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcalts_f32 (float32_t __a, float32_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_high_f64_f32 (float32x4_t __a)
+ {
+- return __builtin_fabsf (__a) < __builtin_fabsf (__b) ? -1 : 0;
++ return __builtin_aarch64_vec_unpacks_hi_v4sf (__a);
+ }
+
+-/* vceq - vector. */
++/* vcvt (<u>fixed-point -> float). */
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vceq_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtd_n_f64_s64 (int64_t __a, const int __b)
+ {
+- return (uint32x2_t) (__a == __b);
++ return __builtin_aarch64_scvtfdi (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vceq_f64 (float64x1_t __a, float64x1_t __b)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtd_n_f64_u64 (uint64_t __a, const int __b)
+ {
+- return (uint64x1_t) (__a == __b);
++ return __builtin_aarch64_ucvtfdi_sus (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vceq_p8 (poly8x8_t __a, poly8x8_t __b)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvts_n_f32_s32 (int32_t __a, const int __b)
+ {
+- return (uint8x8_t) (__a == __b);
++ return __builtin_aarch64_scvtfsi (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vceq_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvts_n_f32_u32 (uint32_t __a, const int __b)
+ {
+- return (uint8x8_t) (__a == __b);
++ return __builtin_aarch64_ucvtfsi_sus (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vceq_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_f32_s32 (int32x2_t __a, const int __b)
+ {
+- return (uint16x4_t) (__a == __b);
++ return __builtin_aarch64_scvtfv2si (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vceq_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_f32_u32 (uint32x2_t __a, const int __b)
+ {
+- return (uint32x2_t) (__a == __b);
++ return __builtin_aarch64_ucvtfv2si_sus (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vceq_s64 (int64x1_t __a, int64x1_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_f64_s64 (int64x1_t __a, const int __b)
+ {
+- return (uint64x1_t) (__a == __b);
++ return (float64x1_t)
++ { __builtin_aarch64_scvtfdi (vget_lane_s64 (__a, 0), __b) };
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vceq_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_f64_u64 (uint64x1_t __a, const int __b)
+ {
+- return (__a == __b);
++ return (float64x1_t)
++ { __builtin_aarch64_ucvtfdi_sus (vget_lane_u64 (__a, 0), __b) };
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vceq_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_f32_s32 (int32x4_t __a, const int __b)
+ {
+- return (__a == __b);
++ return __builtin_aarch64_scvtfv4si (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vceq_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_f32_u32 (uint32x4_t __a, const int __b)
+ {
+- return (__a == __b);
++ return __builtin_aarch64_ucvtfv4si_sus (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vceq_u64 (uint64x1_t __a, uint64x1_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_f64_s64 (int64x2_t __a, const int __b)
+ {
+- return (__a == __b);
++ return __builtin_aarch64_scvtfv2di (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vceqq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_f64_u64 (uint64x2_t __a, const int __b)
+ {
+- return (uint32x4_t) (__a == __b);
++ return __builtin_aarch64_ucvtfv2di_sus (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vceqq_f64 (float64x2_t __a, float64x2_t __b)
++/* vcvt (float -> <u>fixed-point). */
++
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtd_n_s64_f64 (float64_t __a, const int __b)
+ {
+- return (uint64x2_t) (__a == __b);
++ return __builtin_aarch64_fcvtzsdf (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vceqq_p8 (poly8x16_t __a, poly8x16_t __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtd_n_u64_f64 (float64_t __a, const int __b)
+ {
+- return (uint8x16_t) (__a == __b);
++ return __builtin_aarch64_fcvtzudf_uss (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vceqq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvts_n_s32_f32 (float32_t __a, const int __b)
+ {
+- return (uint8x16_t) (__a == __b);
++ return __builtin_aarch64_fcvtzssf (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vceqq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvts_n_u32_f32 (float32_t __a, const int __b)
+ {
+- return (uint16x8_t) (__a == __b);
++ return __builtin_aarch64_fcvtzusf_uss (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vceqq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_s32_f32 (float32x2_t __a, const int __b)
+ {
+- return (uint32x4_t) (__a == __b);
++ return __builtin_aarch64_fcvtzsv2sf (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vceqq_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_u32_f32 (float32x2_t __a, const int __b)
+ {
+- return (uint64x2_t) (__a == __b);
++ return __builtin_aarch64_fcvtzuv2sf_uss (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vceqq_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_s64_f64 (float64x1_t __a, const int __b)
+ {
+- return (__a == __b);
++ return (int64x1_t)
++ { __builtin_aarch64_fcvtzsdf (vget_lane_f64 (__a, 0), __b) };
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vceqq_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_u64_f64 (float64x1_t __a, const int __b)
+ {
+- return (__a == __b);
++ return (uint64x1_t)
++ { __builtin_aarch64_fcvtzudf_uss (vget_lane_f64 (__a, 0), __b) };
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vceqq_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_s32_f32 (float32x4_t __a, const int __b)
+ {
+- return (__a == __b);
++ return __builtin_aarch64_fcvtzsv4sf (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vceqq_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_u32_f32 (float32x4_t __a, const int __b)
+ {
+- return (__a == __b);
++ return __builtin_aarch64_fcvtzuv4sf_uss (__a, __b);
+ }
+
+-/* vceq - scalar. */
+-
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vceqs_f32 (float32_t __a, float32_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_s64_f64 (float64x2_t __a, const int __b)
+ {
+- return __a == __b ? -1 : 0;
++ return __builtin_aarch64_fcvtzsv2df (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vceqd_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_u64_f64 (float64x2_t __a, const int __b)
+ {
+- return __a == __b ? -1ll : 0ll;
++ return __builtin_aarch64_fcvtzuv2df_uss (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vceqd_u64 (uint64_t __a, uint64_t __b)
++/* vcvt (<u>int -> float) */
++
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtd_f64_s64 (int64_t __a)
+ {
+- return __a == __b ? -1ll : 0ll;
++ return (float64_t) __a;
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vceqd_f64 (float64_t __a, float64_t __b)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtd_f64_u64 (uint64_t __a)
+ {
+- return __a == __b ? -1ll : 0ll;
++ return (float64_t) __a;
+ }
+
+-/* vceqz - vector. */
+-
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vceqz_f32 (float32x2_t __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvts_f32_s32 (int32_t __a)
+ {
+- return (uint32x2_t) (__a == 0.0f);
++ return (float32_t) __a;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vceqz_f64 (float64x1_t __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvts_f32_u32 (uint32_t __a)
+ {
+- return (uint64x1_t) (__a == (float64x1_t) {0.0});
++ return (float32_t) __a;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vceqz_p8 (poly8x8_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_f32_s32 (int32x2_t __a)
+ {
+- return (uint8x8_t) (__a == 0);
++ return __builtin_aarch64_floatv2siv2sf (__a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vceqz_s8 (int8x8_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_f32_u32 (uint32x2_t __a)
+ {
+- return (uint8x8_t) (__a == 0);
++ return __builtin_aarch64_floatunsv2siv2sf ((int32x2_t) __a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vceqz_s16 (int16x4_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_f64_s64 (int64x1_t __a)
+ {
+- return (uint16x4_t) (__a == 0);
++ return (float64x1_t) { vget_lane_s64 (__a, 0) };
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vceqz_s32 (int32x2_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_f64_u64 (uint64x1_t __a)
+ {
+- return (uint32x2_t) (__a == 0);
++ return (float64x1_t) { vget_lane_u64 (__a, 0) };
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vceqz_s64 (int64x1_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_f32_s32 (int32x4_t __a)
+ {
+- return (uint64x1_t) (__a == __AARCH64_INT64_C (0));
++ return __builtin_aarch64_floatv4siv4sf (__a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vceqz_u8 (uint8x8_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_f32_u32 (uint32x4_t __a)
+ {
+- return (__a == 0);
++ return __builtin_aarch64_floatunsv4siv4sf ((int32x4_t) __a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vceqz_u16 (uint16x4_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_f64_s64 (int64x2_t __a)
+ {
+- return (__a == 0);
++ return __builtin_aarch64_floatv2div2df (__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vceqz_u32 (uint32x2_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_f64_u64 (uint64x2_t __a)
+ {
+- return (__a == 0);
++ return __builtin_aarch64_floatunsv2div2df ((int64x2_t) __a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vceqz_u64 (uint64x1_t __a)
++/* vcvt (float -> <u>int) */
++
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtd_s64_f64 (float64_t __a)
+ {
+- return (__a == __AARCH64_UINT64_C (0));
++ return (int64_t) __a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vceqzq_f32 (float32x4_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtd_u64_f64 (float64_t __a)
+ {
+- return (uint32x4_t) (__a == 0.0f);
++ return (uint64_t) __a;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vceqzq_f64 (float64x2_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvts_s32_f32 (float32_t __a)
+ {
+- return (uint64x2_t) (__a == 0.0f);
++ return (int32_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vceqzq_p8 (poly8x16_t __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvts_u32_f32 (float32_t __a)
+ {
+- return (uint8x16_t) (__a == 0);
++ return (uint32_t) __a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vceqzq_s8 (int8x16_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_s32_f32 (float32x2_t __a)
+ {
+- return (uint8x16_t) (__a == 0);
++ return __builtin_aarch64_lbtruncv2sfv2si (__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vceqzq_s16 (int16x8_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_u32_f32 (float32x2_t __a)
+ {
+- return (uint16x8_t) (__a == 0);
++ return __builtin_aarch64_lbtruncuv2sfv2si_us (__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vceqzq_s32 (int32x4_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_s32_f32 (float32x4_t __a)
+ {
+- return (uint32x4_t) (__a == 0);
++ return __builtin_aarch64_lbtruncv4sfv4si (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vceqzq_s64 (int64x2_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_u32_f32 (float32x4_t __a)
+ {
+- return (uint64x2_t) (__a == __AARCH64_INT64_C (0));
++ return __builtin_aarch64_lbtruncuv4sfv4si_us (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vceqzq_u8 (uint8x16_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_s64_f64 (float64x1_t __a)
+ {
+- return (__a == 0);
++ return (int64x1_t) {vcvtd_s64_f64 (__a[0])};
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vceqzq_u16 (uint16x8_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_u64_f64 (float64x1_t __a)
+ {
+- return (__a == 0);
++ return (uint64x1_t) {vcvtd_u64_f64 (__a[0])};
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vceqzq_u32 (uint32x4_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_s64_f64 (float64x2_t __a)
+ {
+- return (__a == 0);
++ return __builtin_aarch64_lbtruncv2dfv2di (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vceqzq_u64 (uint64x2_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_u64_f64 (float64x2_t __a)
+ {
+- return (__a == __AARCH64_UINT64_C (0));
++ return __builtin_aarch64_lbtruncuv2dfv2di_us (__a);
+ }
+
+-/* vceqz - scalar. */
++/* vcvta */
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vceqzs_f32 (float32_t __a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtad_s64_f64 (float64_t __a)
+ {
+- return __a == 0.0f ? -1 : 0;
++ return __builtin_aarch64_lrounddfdi (__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vceqzd_s64 (int64_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtad_u64_f64 (float64_t __a)
+ {
+- return __a == 0 ? -1ll : 0ll;
++ return __builtin_aarch64_lroundudfdi_us (__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vceqzd_u64 (uint64_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtas_s32_f32 (float32_t __a)
+ {
+- return __a == 0 ? -1ll : 0ll;
++ return __builtin_aarch64_lroundsfsi (__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vceqzd_f64 (float64_t __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtas_u32_f32 (float32_t __a)
+ {
+- return __a == 0.0 ? -1ll : 0ll;
++ return __builtin_aarch64_lroundusfsi_us (__a);
+ }
+
+-/* vcge - vector. */
+-
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcge_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvta_s32_f32 (float32x2_t __a)
+ {
+- return (uint32x2_t) (__a >= __b);
++ return __builtin_aarch64_lroundv2sfv2si (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcge_f64 (float64x1_t __a, float64x1_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvta_u32_f32 (float32x2_t __a)
+ {
+- return (uint64x1_t) (__a >= __b);
++ return __builtin_aarch64_lrounduv2sfv2si_us (__a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vcge_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtaq_s32_f32 (float32x4_t __a)
+ {
+- return (uint8x8_t) (__a >= __b);
++ return __builtin_aarch64_lroundv4sfv4si (__a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vcge_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtaq_u32_f32 (float32x4_t __a)
+ {
+- return (uint16x4_t) (__a >= __b);
++ return __builtin_aarch64_lrounduv4sfv4si_us (__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcge_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvta_s64_f64 (float64x1_t __a)
+ {
+- return (uint32x2_t) (__a >= __b);
++ return (int64x1_t) {vcvtad_s64_f64 (__a[0])};
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcge_s64 (int64x1_t __a, int64x1_t __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvta_u64_f64 (float64x1_t __a)
+ {
+- return (uint64x1_t) (__a >= __b);
++ return (uint64x1_t) {vcvtad_u64_f64 (__a[0])};
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vcge_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtaq_s64_f64 (float64x2_t __a)
+ {
+- return (__a >= __b);
++ return __builtin_aarch64_lroundv2dfv2di (__a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vcge_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtaq_u64_f64 (float64x2_t __a)
+ {
+- return (__a >= __b);
++ return __builtin_aarch64_lrounduv2dfv2di_us (__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcge_u32 (uint32x2_t __a, uint32x2_t __b)
++/* vcvtm */
++
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtmd_s64_f64 (float64_t __a)
+ {
+- return (__a >= __b);
++ return __builtin_llfloor (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcge_u64 (uint64x1_t __a, uint64x1_t __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtmd_u64_f64 (float64_t __a)
+ {
+- return (__a >= __b);
++ return __builtin_aarch64_lfloorudfdi_us (__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcgeq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtms_s32_f32 (float32_t __a)
+ {
+- return (uint32x4_t) (__a >= __b);
++ return __builtin_ifloorf (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcgeq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtms_u32_f32 (float32_t __a)
+ {
+- return (uint64x2_t) (__a >= __b);
++ return __builtin_aarch64_lfloorusfsi_us (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcgeq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtm_s32_f32 (float32x2_t __a)
+ {
+- return (uint8x16_t) (__a >= __b);
++ return __builtin_aarch64_lfloorv2sfv2si (__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcgeq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtm_u32_f32 (float32x2_t __a)
+ {
+- return (uint16x8_t) (__a >= __b);
++ return __builtin_aarch64_lflooruv2sfv2si_us (__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcgeq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtmq_s32_f32 (float32x4_t __a)
+ {
+- return (uint32x4_t) (__a >= __b);
++ return __builtin_aarch64_lfloorv4sfv4si (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcgeq_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtmq_u32_f32 (float32x4_t __a)
+ {
+- return (uint64x2_t) (__a >= __b);
++ return __builtin_aarch64_lflooruv4sfv4si_us (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcgeq_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtm_s64_f64 (float64x1_t __a)
+ {
+- return (__a >= __b);
++ return (int64x1_t) {vcvtmd_s64_f64 (__a[0])};
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcgeq_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtm_u64_f64 (float64x1_t __a)
+ {
+- return (__a >= __b);
++ return (uint64x1_t) {vcvtmd_u64_f64 (__a[0])};
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcgeq_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtmq_s64_f64 (float64x2_t __a)
+ {
+- return (__a >= __b);
++ return __builtin_aarch64_lfloorv2dfv2di (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcgeq_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtmq_u64_f64 (float64x2_t __a)
+ {
+- return (__a >= __b);
++ return __builtin_aarch64_lflooruv2dfv2di_us (__a);
+ }
+
+-/* vcge - scalar. */
++/* vcvtn */
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcges_f32 (float32_t __a, float32_t __b)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtnd_s64_f64 (float64_t __a)
+ {
+- return __a >= __b ? -1 : 0;
++ return __builtin_aarch64_lfrintndfdi (__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcged_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtnd_u64_f64 (float64_t __a)
+ {
+- return __a >= __b ? -1ll : 0ll;
++ return __builtin_aarch64_lfrintnudfdi_us (__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcged_u64 (uint64_t __a, uint64_t __b)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtns_s32_f32 (float32_t __a)
+ {
+- return __a >= __b ? -1ll : 0ll;
++ return __builtin_aarch64_lfrintnsfsi (__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcged_f64 (float64_t __a, float64_t __b)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtns_u32_f32 (float32_t __a)
+ {
+- return __a >= __b ? -1ll : 0ll;
++ return __builtin_aarch64_lfrintnusfsi_us (__a);
+ }
+
+-/* vcgez - vector. */
+-
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcgez_f32 (float32x2_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtn_s32_f32 (float32x2_t __a)
+ {
+- return (uint32x2_t) (__a >= 0.0f);
++ return __builtin_aarch64_lfrintnv2sfv2si (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcgez_f64 (float64x1_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtn_u32_f32 (float32x2_t __a)
+ {
+- return (uint64x1_t) (__a[0] >= (float64x1_t) {0.0});
++ return __builtin_aarch64_lfrintnuv2sfv2si_us (__a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vcgez_s8 (int8x8_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtnq_s32_f32 (float32x4_t __a)
+ {
+- return (uint8x8_t) (__a >= 0);
++ return __builtin_aarch64_lfrintnv4sfv4si (__a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vcgez_s16 (int16x4_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtnq_u32_f32 (float32x4_t __a)
+ {
+- return (uint16x4_t) (__a >= 0);
++ return __builtin_aarch64_lfrintnuv4sfv4si_us (__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcgez_s32 (int32x2_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtn_s64_f64 (float64x1_t __a)
+ {
+- return (uint32x2_t) (__a >= 0);
++ return (int64x1_t) {vcvtnd_s64_f64 (__a[0])};
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcgez_s64 (int64x1_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtn_u64_f64 (float64x1_t __a)
+ {
+- return (uint64x1_t) (__a >= __AARCH64_INT64_C (0));
++ return (uint64x1_t) {vcvtnd_u64_f64 (__a[0])};
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcgezq_f32 (float32x4_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtnq_s64_f64 (float64x2_t __a)
+ {
+- return (uint32x4_t) (__a >= 0.0f);
++ return __builtin_aarch64_lfrintnv2dfv2di (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcgezq_f64 (float64x2_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtnq_u64_f64 (float64x2_t __a)
+ {
+- return (uint64x2_t) (__a >= 0.0);
++ return __builtin_aarch64_lfrintnuv2dfv2di_us (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcgezq_s8 (int8x16_t __a)
++/* vcvtp */
++
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtpd_s64_f64 (float64_t __a)
+ {
+- return (uint8x16_t) (__a >= 0);
++ return __builtin_llceil (__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcgezq_s16 (int16x8_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtpd_u64_f64 (float64_t __a)
+ {
+- return (uint16x8_t) (__a >= 0);
++ return __builtin_aarch64_lceiludfdi_us (__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcgezq_s32 (int32x4_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtps_s32_f32 (float32_t __a)
+ {
+- return (uint32x4_t) (__a >= 0);
++ return __builtin_iceilf (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcgezq_s64 (int64x2_t __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtps_u32_f32 (float32_t __a)
+ {
+- return (uint64x2_t) (__a >= __AARCH64_INT64_C (0));
++ return __builtin_aarch64_lceilusfsi_us (__a);
+ }
+
+-/* vcgez - scalar. */
+-
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcgezs_f32 (float32_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtp_s32_f32 (float32x2_t __a)
+ {
+- return __a >= 0.0f ? -1 : 0;
++ return __builtin_aarch64_lceilv2sfv2si (__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcgezd_s64 (int64_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtp_u32_f32 (float32x2_t __a)
+ {
+- return __a >= 0 ? -1ll : 0ll;
++ return __builtin_aarch64_lceiluv2sfv2si_us (__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcgezd_f64 (float64_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtpq_s32_f32 (float32x4_t __a)
+ {
+- return __a >= 0.0 ? -1ll : 0ll;
++ return __builtin_aarch64_lceilv4sfv4si (__a);
+ }
+
+-/* vcgt - vector. */
+-
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcgt_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtpq_u32_f32 (float32x4_t __a)
+ {
+- return (uint32x2_t) (__a > __b);
++ return __builtin_aarch64_lceiluv4sfv4si_us (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcgt_f64 (float64x1_t __a, float64x1_t __b)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtp_s64_f64 (float64x1_t __a)
+ {
+- return (uint64x1_t) (__a > __b);
++ return (int64x1_t) {vcvtpd_s64_f64 (__a[0])};
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vcgt_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtp_u64_f64 (float64x1_t __a)
+ {
+- return (uint8x8_t) (__a > __b);
++ return (uint64x1_t) {vcvtpd_u64_f64 (__a[0])};
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vcgt_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtpq_s64_f64 (float64x2_t __a)
+ {
+- return (uint16x4_t) (__a > __b);
++ return __builtin_aarch64_lceilv2dfv2di (__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcgt_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtpq_u64_f64 (float64x2_t __a)
+ {
+- return (uint32x2_t) (__a > __b);
++ return __builtin_aarch64_lceiluv2dfv2di_us (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcgt_s64 (int64x1_t __a, int64x1_t __b)
++/* vdup_n */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_f16 (float16_t __a)
+ {
+- return (uint64x1_t) (__a > __b);
++ return (float16x4_t) {__a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vcgt_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_f32 (float32_t __a)
+ {
+- return (__a > __b);
++ return (float32x2_t) {__a, __a};
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vcgt_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_f64 (float64_t __a)
+ {
+- return (__a > __b);
++ return (float64x1_t) {__a};
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcgt_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_p8 (poly8_t __a)
+ {
+- return (__a > __b);
++ return (poly8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcgt_u64 (uint64x1_t __a, uint64x1_t __b)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_p16 (poly16_t __a)
+ {
+- return (__a > __b);
++ return (poly16x4_t) {__a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcgtq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_s8 (int8_t __a)
+ {
+- return (uint32x4_t) (__a > __b);
++ return (int8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcgtq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_s16 (int16_t __a)
+ {
+- return (uint64x2_t) (__a > __b);
++ return (int16x4_t) {__a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcgtq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_s32 (int32_t __a)
+ {
+- return (uint8x16_t) (__a > __b);
++ return (int32x2_t) {__a, __a};
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcgtq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_s64 (int64_t __a)
+ {
+- return (uint16x8_t) (__a > __b);
++ return (int64x1_t) {__a};
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcgtq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_u8 (uint8_t __a)
+ {
+- return (uint32x4_t) (__a > __b);
++ return (uint8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcgtq_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_u16 (uint16_t __a)
+ {
+- return (uint64x2_t) (__a > __b);
++ return (uint16x4_t) {__a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcgtq_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_u32 (uint32_t __a)
+ {
+- return (__a > __b);
++ return (uint32x2_t) {__a, __a};
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcgtq_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_n_u64 (uint64_t __a)
+ {
+- return (__a > __b);
++ return (uint64x1_t) {__a};
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcgtq_u32 (uint32x4_t __a, uint32x4_t __b)
++/* vdupq_n */
++
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_f16 (float16_t __a)
+ {
+- return (__a > __b);
++ return (float16x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcgtq_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_f32 (float32_t __a)
+ {
+- return (__a > __b);
++ return (float32x4_t) {__a, __a, __a, __a};
+ }
+
+-/* vcgt - scalar. */
+-
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcgts_f32 (float32_t __a, float32_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_f64 (float64_t __a)
+ {
+- return __a > __b ? -1 : 0;
++ return (float64x2_t) {__a, __a};
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcgtd_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_p8 (uint32_t __a)
+ {
+- return __a > __b ? -1ll : 0ll;
++ return (poly8x16_t) {__a, __a, __a, __a, __a, __a, __a, __a,
++ __a, __a, __a, __a, __a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcgtd_u64 (uint64_t __a, uint64_t __b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_p16 (uint32_t __a)
+ {
+- return __a > __b ? -1ll : 0ll;
++ return (poly16x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcgtd_f64 (float64_t __a, float64_t __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_s8 (int32_t __a)
+ {
+- return __a > __b ? -1ll : 0ll;
++ return (int8x16_t) {__a, __a, __a, __a, __a, __a, __a, __a,
++ __a, __a, __a, __a, __a, __a, __a, __a};
+ }
+
+-/* vcgtz - vector. */
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_s16 (int32_t __a)
++{
++ return (int16x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
++}
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcgtz_f32 (float32x2_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_s32 (int32_t __a)
+ {
+- return (uint32x2_t) (__a > 0.0f);
++ return (int32x4_t) {__a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcgtz_f64 (float64x1_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_s64 (int64_t __a)
+ {
+- return (uint64x1_t) (__a > (float64x1_t) {0.0});
++ return (int64x2_t) {__a, __a};
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vcgtz_s8 (int8x8_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_u8 (uint32_t __a)
+ {
+- return (uint8x8_t) (__a > 0);
++ return (uint8x16_t) {__a, __a, __a, __a, __a, __a, __a, __a,
++ __a, __a, __a, __a, __a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vcgtz_s16 (int16x4_t __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_u16 (uint32_t __a)
+ {
+- return (uint16x4_t) (__a > 0);
++ return (uint16x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcgtz_s32 (int32x2_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_u32 (uint32_t __a)
+ {
+- return (uint32x2_t) (__a > 0);
++ return (uint32x4_t) {__a, __a, __a, __a};
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcgtz_s64 (int64x1_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_n_u64 (uint64_t __a)
+ {
+- return (uint64x1_t) (__a > __AARCH64_INT64_C (0));
++ return (uint64x2_t) {__a, __a};
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcgtzq_f32 (float32x4_t __a)
++/* vdup_lane */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_f16 (float16x4_t __a, const int __b)
+ {
+- return (uint32x4_t) (__a > 0.0f);
++ return __aarch64_vdup_lane_f16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcgtzq_f64 (float64x2_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_f32 (float32x2_t __a, const int __b)
+ {
+- return (uint64x2_t) (__a > 0.0);
++ return __aarch64_vdup_lane_f32 (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcgtzq_s8 (int8x16_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_f64 (float64x1_t __a, const int __b)
+ {
+- return (uint8x16_t) (__a > 0);
++ return __aarch64_vdup_lane_f64 (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcgtzq_s16 (int16x8_t __a)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_p8 (poly8x8_t __a, const int __b)
+ {
+- return (uint16x8_t) (__a > 0);
++ return __aarch64_vdup_lane_p8 (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcgtzq_s32 (int32x4_t __a)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_p16 (poly16x4_t __a, const int __b)
+ {
+- return (uint32x4_t) (__a > 0);
++ return __aarch64_vdup_lane_p16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcgtzq_s64 (int64x2_t __a)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_s8 (int8x8_t __a, const int __b)
+ {
+- return (uint64x2_t) (__a > __AARCH64_INT64_C (0));
++ return __aarch64_vdup_lane_s8 (__a, __b);
+ }
+
+-/* vcgtz - scalar. */
+-
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcgtzs_f32 (float32_t __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_s16 (int16x4_t __a, const int __b)
+ {
+- return __a > 0.0f ? -1 : 0;
++ return __aarch64_vdup_lane_s16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcgtzd_s64 (int64_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_s32 (int32x2_t __a, const int __b)
+ {
+- return __a > 0 ? -1ll : 0ll;
++ return __aarch64_vdup_lane_s32 (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcgtzd_f64 (float64_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_s64 (int64x1_t __a, const int __b)
+ {
+- return __a > 0.0 ? -1ll : 0ll;
++ return __aarch64_vdup_lane_s64 (__a, __b);
+ }
+
+-/* vcle - vector. */
+-
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcle_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_u8 (uint8x8_t __a, const int __b)
+ {
+- return (uint32x2_t) (__a <= __b);
++ return __aarch64_vdup_lane_u8 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcle_f64 (float64x1_t __a, float64x1_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_u16 (uint16x4_t __a, const int __b)
+ {
+- return (uint64x1_t) (__a <= __b);
++ return __aarch64_vdup_lane_u16 (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vcle_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_u32 (uint32x2_t __a, const int __b)
+ {
+- return (uint8x8_t) (__a <= __b);
++ return __aarch64_vdup_lane_u32 (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vcle_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_lane_u64 (uint64x1_t __a, const int __b)
+ {
+- return (uint16x4_t) (__a <= __b);
++ return __aarch64_vdup_lane_u64 (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcle_s32 (int32x2_t __a, int32x2_t __b)
++/* vdup_laneq */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_f16 (float16x8_t __a, const int __b)
+ {
+- return (uint32x2_t) (__a <= __b);
++ return __aarch64_vdup_laneq_f16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcle_s64 (int64x1_t __a, int64x1_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_f32 (float32x4_t __a, const int __b)
+ {
+- return (uint64x1_t) (__a <= __b);
++ return __aarch64_vdup_laneq_f32 (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vcle_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_f64 (float64x2_t __a, const int __b)
+ {
+- return (__a <= __b);
++ return __aarch64_vdup_laneq_f64 (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vcle_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_p8 (poly8x16_t __a, const int __b)
+ {
+- return (__a <= __b);
++ return __aarch64_vdup_laneq_p8 (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcle_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_p16 (poly16x8_t __a, const int __b)
+ {
+- return (__a <= __b);
++ return __aarch64_vdup_laneq_p16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcle_u64 (uint64x1_t __a, uint64x1_t __b)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_s8 (int8x16_t __a, const int __b)
+ {
+- return (__a <= __b);
++ return __aarch64_vdup_laneq_s8 (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcleq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_s16 (int16x8_t __a, const int __b)
+ {
+- return (uint32x4_t) (__a <= __b);
++ return __aarch64_vdup_laneq_s16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcleq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_s32 (int32x4_t __a, const int __b)
+ {
+- return (uint64x2_t) (__a <= __b);
++ return __aarch64_vdup_laneq_s32 (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcleq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_s64 (int64x2_t __a, const int __b)
+ {
+- return (uint8x16_t) (__a <= __b);
++ return __aarch64_vdup_laneq_s64 (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcleq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_u8 (uint8x16_t __a, const int __b)
+ {
+- return (uint16x8_t) (__a <= __b);
++ return __aarch64_vdup_laneq_u8 (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcleq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_u16 (uint16x8_t __a, const int __b)
+ {
+- return (uint32x4_t) (__a <= __b);
++ return __aarch64_vdup_laneq_u16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcleq_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_u32 (uint32x4_t __a, const int __b)
+ {
+- return (uint64x2_t) (__a <= __b);
++ return __aarch64_vdup_laneq_u32 (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcleq_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdup_laneq_u64 (uint64x2_t __a, const int __b)
+ {
+- return (__a <= __b);
++ return __aarch64_vdup_laneq_u64 (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcleq_u16 (uint16x8_t __a, uint16x8_t __b)
++/* vdupq_lane */
++
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_f16 (float16x4_t __a, const int __b)
+ {
+- return (__a <= __b);
++ return __aarch64_vdupq_lane_f16 (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcleq_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_f32 (float32x2_t __a, const int __b)
+ {
+- return (__a <= __b);
++ return __aarch64_vdupq_lane_f32 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcleq_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_f64 (float64x1_t __a, const int __b)
+ {
+- return (__a <= __b);
++ return __aarch64_vdupq_lane_f64 (__a, __b);
+ }
+
+-/* vcle - scalar. */
+-
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcles_f32 (float32_t __a, float32_t __b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_p8 (poly8x8_t __a, const int __b)
+ {
+- return __a <= __b ? -1 : 0;
++ return __aarch64_vdupq_lane_p8 (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcled_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_p16 (poly16x4_t __a, const int __b)
+ {
+- return __a <= __b ? -1ll : 0ll;
++ return __aarch64_vdupq_lane_p16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcled_u64 (uint64_t __a, uint64_t __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_s8 (int8x8_t __a, const int __b)
+ {
+- return __a <= __b ? -1ll : 0ll;
++ return __aarch64_vdupq_lane_s8 (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcled_f64 (float64_t __a, float64_t __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_s16 (int16x4_t __a, const int __b)
+ {
+- return __a <= __b ? -1ll : 0ll;
++ return __aarch64_vdupq_lane_s16 (__a, __b);
+ }
+
+-/* vclez - vector. */
+-
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vclez_f32 (float32x2_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_s32 (int32x2_t __a, const int __b)
+ {
+- return (uint32x2_t) (__a <= 0.0f);
++ return __aarch64_vdupq_lane_s32 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vclez_f64 (float64x1_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_s64 (int64x1_t __a, const int __b)
+ {
+- return (uint64x1_t) (__a <= (float64x1_t) {0.0});
++ return __aarch64_vdupq_lane_s64 (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vclez_s8 (int8x8_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_u8 (uint8x8_t __a, const int __b)
+ {
+- return (uint8x8_t) (__a <= 0);
++ return __aarch64_vdupq_lane_u8 (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vclez_s16 (int16x4_t __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_u16 (uint16x4_t __a, const int __b)
+ {
+- return (uint16x4_t) (__a <= 0);
++ return __aarch64_vdupq_lane_u16 (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vclez_s32 (int32x2_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_u32 (uint32x2_t __a, const int __b)
+ {
+- return (uint32x2_t) (__a <= 0);
++ return __aarch64_vdupq_lane_u32 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vclez_s64 (int64x1_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_lane_u64 (uint64x1_t __a, const int __b)
+ {
+- return (uint64x1_t) (__a <= __AARCH64_INT64_C (0));
++ return __aarch64_vdupq_lane_u64 (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vclezq_f32 (float32x4_t __a)
++/* vdupq_laneq */
++
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_f16 (float16x8_t __a, const int __b)
+ {
+- return (uint32x4_t) (__a <= 0.0f);
++ return __aarch64_vdupq_laneq_f16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vclezq_f64 (float64x2_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_f32 (float32x4_t __a, const int __b)
+ {
+- return (uint64x2_t) (__a <= 0.0);
++ return __aarch64_vdupq_laneq_f32 (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vclezq_s8 (int8x16_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_f64 (float64x2_t __a, const int __b)
+ {
+- return (uint8x16_t) (__a <= 0);
++ return __aarch64_vdupq_laneq_f64 (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vclezq_s16 (int16x8_t __a)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_p8 (poly8x16_t __a, const int __b)
+ {
+- return (uint16x8_t) (__a <= 0);
++ return __aarch64_vdupq_laneq_p8 (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vclezq_s32 (int32x4_t __a)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_p16 (poly16x8_t __a, const int __b)
+ {
+- return (uint32x4_t) (__a <= 0);
++ return __aarch64_vdupq_laneq_p16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vclezq_s64 (int64x2_t __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_s8 (int8x16_t __a, const int __b)
+ {
+- return (uint64x2_t) (__a <= __AARCH64_INT64_C (0));
++ return __aarch64_vdupq_laneq_s8 (__a, __b);
+ }
+
+-/* vclez - scalar. */
+-
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vclezs_f32 (float32_t __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_s16 (int16x8_t __a, const int __b)
+ {
+- return __a <= 0.0f ? -1 : 0;
++ return __aarch64_vdupq_laneq_s16 (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vclezd_s64 (int64_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_s32 (int32x4_t __a, const int __b)
+ {
+- return __a <= 0 ? -1ll : 0ll;
++ return __aarch64_vdupq_laneq_s32 (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vclezd_f64 (float64_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_s64 (int64x2_t __a, const int __b)
+ {
+- return __a <= 0.0 ? -1ll : 0ll;
++ return __aarch64_vdupq_laneq_s64 (__a, __b);
+ }
+
+-/* vclt - vector. */
+-
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vclt_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_u8 (uint8x16_t __a, const int __b)
+ {
+- return (uint32x2_t) (__a < __b);
++ return __aarch64_vdupq_laneq_u8 (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vclt_f64 (float64x1_t __a, float64x1_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_u16 (uint16x8_t __a, const int __b)
+ {
+- return (uint64x1_t) (__a < __b);
++ return __aarch64_vdupq_laneq_u16 (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vclt_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_u32 (uint32x4_t __a, const int __b)
+ {
+- return (uint8x8_t) (__a < __b);
++ return __aarch64_vdupq_laneq_u32 (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vclt_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupq_laneq_u64 (uint64x2_t __a, const int __b)
+ {
+- return (uint16x4_t) (__a < __b);
++ return __aarch64_vdupq_laneq_u64 (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vclt_s32 (int32x2_t __a, int32x2_t __b)
++/* vdupb_lane */
++__extension__ extern __inline poly8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupb_lane_p8 (poly8x8_t __a, const int __b)
+ {
+- return (uint32x2_t) (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vclt_s64 (int64x1_t __a, int64x1_t __b)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupb_lane_s8 (int8x8_t __a, const int __b)
+ {
+- return (uint64x1_t) (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vclt_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupb_lane_u8 (uint8x8_t __a, const int __b)
+ {
+- return (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vclt_u16 (uint16x4_t __a, uint16x4_t __b)
++/* vduph_lane */
++
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vduph_lane_f16 (float16x4_t __a, const int __b)
+ {
+- return (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vclt_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline poly16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vduph_lane_p16 (poly16x4_t __a, const int __b)
+ {
+- return (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vclt_u64 (uint64x1_t __a, uint64x1_t __b)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vduph_lane_s16 (int16x4_t __a, const int __b)
+ {
+- return (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcltq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vduph_lane_u16 (uint16x4_t __a, const int __b)
+ {
+- return (uint32x4_t) (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcltq_f64 (float64x2_t __a, float64x2_t __b)
++/* vdups_lane */
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdups_lane_f32 (float32x2_t __a, const int __b)
+ {
+- return (uint64x2_t) (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcltq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdups_lane_s32 (int32x2_t __a, const int __b)
+ {
+- return (uint8x16_t) (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcltq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdups_lane_u32 (uint32x2_t __a, const int __b)
+ {
+- return (uint16x8_t) (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcltq_s32 (int32x4_t __a, int32x4_t __b)
++/* vdupd_lane */
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupd_lane_f64 (float64x1_t __a, const int __b)
+ {
+- return (uint32x4_t) (__a < __b);
++ __AARCH64_LANE_CHECK (__a, __b);
++ return __a[0];
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcltq_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupd_lane_s64 (int64x1_t __a, const int __b)
+ {
+- return (uint64x2_t) (__a < __b);
++ __AARCH64_LANE_CHECK (__a, __b);
++ return __a[0];
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcltq_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupd_lane_u64 (uint64x1_t __a, const int __b)
+ {
+- return (__a < __b);
++ __AARCH64_LANE_CHECK (__a, __b);
++ return __a[0];
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcltq_u16 (uint16x8_t __a, uint16x8_t __b)
++/* vdupb_laneq */
++__extension__ extern __inline poly8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupb_laneq_p8 (poly8x16_t __a, const int __b)
+ {
+- return (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcltq_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupb_laneq_s8 (int8x16_t __a, const int __b)
+ {
+- return (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcltq_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupb_laneq_u8 (uint8x16_t __a, const int __b)
+ {
+- return (__a < __b);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-/* vclt - scalar. */
++/* vduph_laneq */
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vclts_f32 (float32_t __a, float32_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vduph_laneq_f16 (float16x8_t __a, const int __b)
+ {
+- return __a < __b ? -1 : 0;
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcltd_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline poly16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vduph_laneq_p16 (poly16x8_t __a, const int __b)
+ {
+- return __a < __b ? -1ll : 0ll;
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcltd_u64 (uint64_t __a, uint64_t __b)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vduph_laneq_s16 (int16x8_t __a, const int __b)
+ {
+- return __a < __b ? -1ll : 0ll;
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcltd_f64 (float64_t __a, float64_t __b)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vduph_laneq_u16 (uint16x8_t __a, const int __b)
+ {
+- return __a < __b ? -1ll : 0ll;
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-/* vcltz - vector. */
++/* vdups_laneq */
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcltz_f32 (float32x2_t __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdups_laneq_f32 (float32x4_t __a, const int __b)
+ {
+- return (uint32x2_t) (__a < 0.0f);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcltz_f64 (float64x1_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdups_laneq_s32 (int32x4_t __a, const int __b)
+ {
+- return (uint64x1_t) (__a < (float64x1_t) {0.0});
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vcltz_s8 (int8x8_t __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdups_laneq_u32 (uint32x4_t __a, const int __b)
+ {
+- return (uint8x8_t) (__a < 0);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vcltz_s16 (int16x4_t __a)
++/* vdupd_laneq */
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupd_laneq_f64 (float64x2_t __a, const int __b)
+ {
+- return (uint16x4_t) (__a < 0);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcltz_s32 (int32x2_t __a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupd_laneq_s64 (int64x2_t __a, const int __b)
+ {
+- return (uint32x2_t) (__a < 0);
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcltz_s64 (int64x1_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdupd_laneq_u64 (uint64x2_t __a, const int __b)
+ {
+- return (uint64x1_t) (__a < __AARCH64_INT64_C (0));
++ return __aarch64_vget_lane_any (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcltzq_f32 (float32x4_t __a)
++/* vext */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_f16 (float16x4_t __a, float16x4_t __b, __const int __c)
+ {
+- return (uint32x4_t) (__a < 0.0f);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a,
++ (uint16x4_t) {4 - __c, 5 - __c, 6 - __c, 7 - __c});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint16x4_t) {__c, __c + 1, __c + 2, __c + 3});
++#endif
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcltzq_f64 (float64x2_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_f32 (float32x2_t __a, float32x2_t __b, __const int __c)
+ {
+- return (uint64x2_t) (__a < 0.0);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint32x2_t) {2-__c, 3-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {__c, __c+1});
++#endif
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcltzq_s8 (int8x16_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_f64 (float64x1_t __a, float64x1_t __b, __const int __c)
+ {
+- return (uint8x16_t) (__a < 0);
++ __AARCH64_LANE_CHECK (__a, __c);
++ /* The only possible index to the assembler instruction returns element 0. */
++ return __a;
+ }
+-
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vcltzq_s16 (int16x8_t __a)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_p8 (poly8x8_t __a, poly8x8_t __b, __const int __c)
+ {
+- return (uint16x8_t) (__a < 0);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint8x8_t)
++ {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
++#endif
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcltzq_s32 (int32x4_t __a)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_p16 (poly16x4_t __a, poly16x4_t __b, __const int __c)
+ {
+- return (uint32x4_t) (__a < 0);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a,
++ (uint16x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {__c, __c+1, __c+2, __c+3});
++#endif
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcltzq_s64 (int64x2_t __a)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_s8 (int8x8_t __a, int8x8_t __b, __const int __c)
+ {
+- return (uint64x2_t) (__a < __AARCH64_INT64_C (0));
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint8x8_t)
++ {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
++#endif
+ }
+
+-/* vcltz - scalar. */
+-
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcltzs_f32 (float32_t __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_s16 (int16x4_t __a, int16x4_t __b, __const int __c)
+ {
+- return __a < 0.0f ? -1 : 0;
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a,
++ (uint16x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {__c, __c+1, __c+2, __c+3});
++#endif
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcltzd_s64 (int64_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_s32 (int32x2_t __a, int32x2_t __b, __const int __c)
+ {
+- return __a < 0 ? -1ll : 0ll;
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint32x2_t) {2-__c, 3-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {__c, __c+1});
++#endif
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcltzd_f64 (float64_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_s64 (int64x1_t __a, int64x1_t __b, __const int __c)
+ {
+- return __a < 0.0 ? -1ll : 0ll;
++ __AARCH64_LANE_CHECK (__a, __c);
++ /* The only possible index to the assembler instruction returns element 0. */
++ return __a;
+ }
+
+-/* vcls. */
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_u8 (uint8x8_t __a, uint8x8_t __b, __const int __c)
++{
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint8x8_t)
++ {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
++#endif
++}
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vcls_s8 (int8x8_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_u16 (uint16x4_t __a, uint16x4_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clrsbv8qi (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a,
++ (uint16x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {__c, __c+1, __c+2, __c+3});
++#endif
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vcls_s16 (int16x4_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_u32 (uint32x2_t __a, uint32x2_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clrsbv4hi (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint32x2_t) {2-__c, 3-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {__c, __c+1});
++#endif
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vcls_s32 (int32x2_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clrsbv2si (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++ /* The only possible index to the assembler instruction returns element 0. */
++ return __a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vclsq_s8 (int8x16_t __a)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_f16 (float16x8_t __a, float16x8_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clrsbv16qi (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a,
++ (uint16x8_t) {8 - __c, 9 - __c, 10 - __c, 11 - __c,
++ 12 - __c, 13 - __c, 14 - __c,
++ 15 - __c});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint16x8_t) {__c, __c + 1, __c + 2, __c + 3,
++ __c + 4, __c + 5, __c + 6, __c + 7});
++#endif
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vclsq_s16 (int16x8_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_f32 (float32x4_t __a, float32x4_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clrsbv8hi (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a,
++ (uint32x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {__c, __c+1, __c+2, __c+3});
++#endif
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vclsq_s32 (int32x4_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_f64 (float64x2_t __a, float64x2_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clrsbv4si (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint64x2_t) {2-__c, 3-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {__c, __c+1});
++#endif
+ }
+
+-/* vclz. */
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_p8 (poly8x16_t __a, poly8x16_t __b, __const int __c)
++{
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint8x16_t)
++ {16-__c, 17-__c, 18-__c, 19-__c, 20-__c, 21-__c, 22-__c, 23-__c,
++ 24-__c, 25-__c, 26-__c, 27-__c, 28-__c, 29-__c, 30-__c, 31-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7,
++ __c+8, __c+9, __c+10, __c+11, __c+12, __c+13, __c+14, __c+15});
++#endif
++}
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vclz_s8 (int8x8_t __a)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_p16 (poly16x8_t __a, poly16x8_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clzv8qi (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint16x8_t)
++ {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint16x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
++#endif
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vclz_s16 (int16x4_t __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_s8 (int8x16_t __a, int8x16_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clzv4hi (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint8x16_t)
++ {16-__c, 17-__c, 18-__c, 19-__c, 20-__c, 21-__c, 22-__c, 23-__c,
++ 24-__c, 25-__c, 26-__c, 27-__c, 28-__c, 29-__c, 30-__c, 31-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7,
++ __c+8, __c+9, __c+10, __c+11, __c+12, __c+13, __c+14, __c+15});
++#endif
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vclz_s32 (int32x2_t __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_s16 (int16x8_t __a, int16x8_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clzv2si (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint16x8_t)
++ {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint16x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
++#endif
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vclz_u8 (uint8x8_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_s32 (int32x4_t __a, int32x4_t __b, __const int __c)
+ {
+- return (uint8x8_t)__builtin_aarch64_clzv8qi ((int8x8_t)__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a,
++ (uint32x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {__c, __c+1, __c+2, __c+3});
++#endif
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vclz_u16 (uint16x4_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_s64 (int64x2_t __a, int64x2_t __b, __const int __c)
+ {
+- return (uint16x4_t)__builtin_aarch64_clzv4hi ((int16x4_t)__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint64x2_t) {2-__c, 3-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {__c, __c+1});
++#endif
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vclz_u32 (uint32x2_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_u8 (uint8x16_t __a, uint8x16_t __b, __const int __c)
+ {
+- return (uint32x2_t)__builtin_aarch64_clzv2si ((int32x2_t)__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint8x16_t)
++ {16-__c, 17-__c, 18-__c, 19-__c, 20-__c, 21-__c, 22-__c, 23-__c,
++ 24-__c, 25-__c, 26-__c, 27-__c, 28-__c, 29-__c, 30-__c, 31-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7,
++ __c+8, __c+9, __c+10, __c+11, __c+12, __c+13, __c+14, __c+15});
++#endif
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vclzq_s8 (int8x16_t __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_u16 (uint16x8_t __a, uint16x8_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clzv16qi (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint16x8_t)
++ {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint16x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
++#endif
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vclzq_s16 (int16x8_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_u32 (uint32x4_t __a, uint32x4_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clzv8hi (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a,
++ (uint32x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {__c, __c+1, __c+2, __c+3});
++#endif
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vclzq_s32 (int32x4_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vextq_u64 (uint64x2_t __a, uint64x2_t __b, __const int __c)
+ {
+- return __builtin_aarch64_clzv4si (__a);
++ __AARCH64_LANE_CHECK (__a, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__b, __a, (uint64x2_t) {2-__c, 3-__c});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {__c, __c+1});
++#endif
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vclzq_u8 (uint8x16_t __a)
+-{
+- return (uint8x16_t)__builtin_aarch64_clzv16qi ((int8x16_t)__a);
+-}
++/* vfma */
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vclzq_u16 (uint16x8_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c)
+ {
+- return (uint16x8_t)__builtin_aarch64_clzv8hi ((int16x8_t)__a);
++ return (float64x1_t) {__builtin_fma (__b[0], __c[0], __a[0])};
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vclzq_u32 (uint32x4_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
+ {
+- return (uint32x4_t)__builtin_aarch64_clzv4si ((int32x4_t)__a);
++ return __builtin_aarch64_fmav2sf (__b, __c, __a);
+ }
+
+-/* vcnt. */
+-
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vcnt_p8 (poly8x8_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
+ {
+- return (poly8x8_t) __builtin_aarch64_popcountv8qi ((int8x8_t) __a);
++ return __builtin_aarch64_fmav4sf (__b, __c, __a);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vcnt_s8 (int8x8_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_f64 (float64x2_t __a, float64x2_t __b, float64x2_t __c)
+ {
+- return __builtin_aarch64_popcountv8qi (__a);
++ return __builtin_aarch64_fmav2df (__b, __c, __a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vcnt_u8 (uint8x8_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
+ {
+- return (uint8x8_t) __builtin_aarch64_popcountv8qi ((int8x8_t) __a);
++ return __builtin_aarch64_fmav2sf (__b, vdup_n_f32 (__c), __a);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vcntq_p8 (poly8x16_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_n_f64 (float64x1_t __a, float64x1_t __b, float64_t __c)
+ {
+- return (poly8x16_t) __builtin_aarch64_popcountv16qi ((int8x16_t) __a);
++ return (float64x1_t) {__b[0] * __c + __a[0]};
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vcntq_s8 (int8x16_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
+ {
+- return __builtin_aarch64_popcountv16qi (__a);
++ return __builtin_aarch64_fmav4sf (__b, vdupq_n_f32 (__c), __a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vcntq_u8 (uint8x16_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_n_f64 (float64x2_t __a, float64x2_t __b, float64_t __c)
+ {
+- return (uint8x16_t) __builtin_aarch64_popcountv16qi ((int8x16_t) __a);
++ return __builtin_aarch64_fmav2df (__b, vdupq_n_f64 (__c), __a);
+ }
+
+-/* vcvt (double -> float). */
++/* vfma_lane */
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+-vcvt_f16_f32 (float32x4_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_lane_f32 (float32x2_t __a, float32x2_t __b,
++ float32x2_t __c, const int __lane)
+ {
+- return __builtin_aarch64_float_truncate_lo_v4hf (__a);
++ return __builtin_aarch64_fmav2sf (__b,
++ __aarch64_vdup_lane_f32 (__c, __lane),
++ __a);
+ }
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+-vcvt_high_f16_f32 (float16x4_t __a, float32x4_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_lane_f64 (float64x1_t __a, float64x1_t __b,
++ float64x1_t __c, const int __lane)
+ {
+- return __builtin_aarch64_float_truncate_hi_v8hf (__a, __b);
++ return (float64x1_t) {__builtin_fma (__b[0], __c[0], __a[0])};
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vcvt_f32_f64 (float64x2_t __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmad_lane_f64 (float64_t __a, float64_t __b,
++ float64x1_t __c, const int __lane)
+ {
+- return __builtin_aarch64_float_truncate_lo_v2sf (__a);
++ return __builtin_fma (__b, __c[0], __a);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vcvt_high_f32_f64 (float32x2_t __a, float64x2_t __b)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmas_lane_f32 (float32_t __a, float32_t __b,
++ float32x2_t __c, const int __lane)
+ {
+- return __builtin_aarch64_float_truncate_hi_v4sf (__a, __b);
++ return __builtin_fmaf (__b, __aarch64_vget_lane_any (__c, __lane), __a);
+ }
+
+-/* vcvt (float -> double). */
++/* vfma_laneq */
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vcvt_f32_f16 (float16x4_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_laneq_f32 (float32x2_t __a, float32x2_t __b,
++ float32x4_t __c, const int __lane)
+ {
+- return __builtin_aarch64_float_extend_lo_v4sf (__a);
++ return __builtin_aarch64_fmav2sf (__b,
++ __aarch64_vdup_laneq_f32 (__c, __lane),
++ __a);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vcvt_f64_f32 (float32x2_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_laneq_f64 (float64x1_t __a, float64x1_t __b,
++ float64x2_t __c, const int __lane)
+ {
+-
+- return __builtin_aarch64_float_extend_lo_v2df (__a);
++ float64_t __c0 = __aarch64_vget_lane_any (__c, __lane);
++ return (float64x1_t) {__builtin_fma (__b[0], __c0, __a[0])};
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vcvt_high_f32_f16 (float16x8_t __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmad_laneq_f64 (float64_t __a, float64_t __b,
++ float64x2_t __c, const int __lane)
+ {
+- return __builtin_aarch64_vec_unpacks_hi_v8hf (__a);
++ return __builtin_fma (__b, __aarch64_vget_lane_any (__c, __lane), __a);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vcvt_high_f64_f32 (float32x4_t __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmas_laneq_f32 (float32_t __a, float32_t __b,
++ float32x4_t __c, const int __lane)
+ {
+- return __builtin_aarch64_vec_unpacks_hi_v4sf (__a);
++ return __builtin_fmaf (__b, __aarch64_vget_lane_any (__c, __lane), __a);
+ }
+
+-/* vcvt (<u>int -> float) */
+-
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vcvtd_f64_s64 (int64_t __a)
+-{
+- return (float64_t) __a;
+-}
++/* vfmaq_lane */
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vcvtd_f64_u64 (uint64_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_lane_f32 (float32x4_t __a, float32x4_t __b,
++ float32x2_t __c, const int __lane)
+ {
+- return (float64_t) __a;
++ return __builtin_aarch64_fmav4sf (__b,
++ __aarch64_vdupq_lane_f32 (__c, __lane),
++ __a);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vcvts_f32_s32 (int32_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_lane_f64 (float64x2_t __a, float64x2_t __b,
++ float64x1_t __c, const int __lane)
+ {
+- return (float32_t) __a;
++ return __builtin_aarch64_fmav2df (__b, vdupq_n_f64 (__c[0]), __a);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vcvts_f32_u32 (uint32_t __a)
+-{
+- return (float32_t) __a;
+-}
++/* vfmaq_laneq */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vcvt_f32_s32 (int32x2_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_laneq_f32 (float32x4_t __a, float32x4_t __b,
++ float32x4_t __c, const int __lane)
+ {
+- return __builtin_aarch64_floatv2siv2sf (__a);
++ return __builtin_aarch64_fmav4sf (__b,
++ __aarch64_vdupq_laneq_f32 (__c, __lane),
++ __a);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vcvt_f32_u32 (uint32x2_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_laneq_f64 (float64x2_t __a, float64x2_t __b,
++ float64x2_t __c, const int __lane)
+ {
+- return __builtin_aarch64_floatunsv2siv2sf ((int32x2_t) __a);
++ return __builtin_aarch64_fmav2df (__b,
++ __aarch64_vdupq_laneq_f64 (__c, __lane),
++ __a);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vcvtq_f32_s32 (int32x4_t __a)
+-{
+- return __builtin_aarch64_floatv4siv4sf (__a);
+-}
++/* vfms */
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vcvtq_f32_u32 (uint32x4_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c)
+ {
+- return __builtin_aarch64_floatunsv4siv4sf ((int32x4_t) __a);
++ return (float64x1_t) {__builtin_fma (-__b[0], __c[0], __a[0])};
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vcvtq_f64_s64 (int64x2_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
+ {
+- return __builtin_aarch64_floatv2div2df (__a);
++ return __builtin_aarch64_fmav2sf (-__b, __c, __a);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vcvtq_f64_u64 (uint64x2_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
+ {
+- return __builtin_aarch64_floatunsv2div2df ((int64x2_t) __a);
++ return __builtin_aarch64_fmav4sf (-__b, __c, __a);
+ }
+
+-/* vcvt (float -> <u>int) */
+-
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vcvtd_s64_f64 (float64_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_f64 (float64x2_t __a, float64x2_t __b, float64x2_t __c)
+ {
+- return (int64_t) __a;
++ return __builtin_aarch64_fmav2df (-__b, __c, __a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcvtd_u64_f64 (float64_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
+ {
+- return (uint64_t) __a;
++ return __builtin_aarch64_fmav2sf (-__b, vdup_n_f32 (__c), __a);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vcvts_s32_f32 (float32_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_n_f64 (float64x1_t __a, float64x1_t __b, float64_t __c)
+ {
+- return (int32_t) __a;
++ return (float64x1_t) {-__b[0] * __c + __a[0]};
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcvts_u32_f32 (float32_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
+ {
+- return (uint32_t) __a;
++ return __builtin_aarch64_fmav4sf (-__b, vdupq_n_f32 (__c), __a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vcvt_s32_f32 (float32x2_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_n_f64 (float64x2_t __a, float64x2_t __b, float64_t __c)
+ {
+- return __builtin_aarch64_lbtruncv2sfv2si (__a);
++ return __builtin_aarch64_fmav2df (-__b, vdupq_n_f64 (__c), __a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcvt_u32_f32 (float32x2_t __a)
+-{
+- return __builtin_aarch64_lbtruncuv2sfv2si_us (__a);
+-}
++/* vfms_lane */
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vcvtq_s32_f32 (float32x4_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_lane_f32 (float32x2_t __a, float32x2_t __b,
++ float32x2_t __c, const int __lane)
+ {
+- return __builtin_aarch64_lbtruncv4sfv4si (__a);
++ return __builtin_aarch64_fmav2sf (-__b,
++ __aarch64_vdup_lane_f32 (__c, __lane),
++ __a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcvtq_u32_f32 (float32x4_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_lane_f64 (float64x1_t __a, float64x1_t __b,
++ float64x1_t __c, const int __lane)
+ {
+- return __builtin_aarch64_lbtruncuv4sfv4si_us (__a);
++ return (float64x1_t) {__builtin_fma (-__b[0], __c[0], __a[0])};
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vcvt_s64_f64 (float64x1_t __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsd_lane_f64 (float64_t __a, float64_t __b,
++ float64x1_t __c, const int __lane)
+ {
+- return (int64x1_t) {vcvtd_s64_f64 (__a[0])};
++ return __builtin_fma (-__b, __c[0], __a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcvt_u64_f64 (float64x1_t __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmss_lane_f32 (float32_t __a, float32_t __b,
++ float32x2_t __c, const int __lane)
+ {
+- return (uint64x1_t) {vcvtd_u64_f64 (__a[0])};
++ return __builtin_fmaf (-__b, __aarch64_vget_lane_any (__c, __lane), __a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vcvtq_s64_f64 (float64x2_t __a)
+-{
+- return __builtin_aarch64_lbtruncv2dfv2di (__a);
+-}
++/* vfms_laneq */
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcvtq_u64_f64 (float64x2_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_laneq_f32 (float32x2_t __a, float32x2_t __b,
++ float32x4_t __c, const int __lane)
+ {
+- return __builtin_aarch64_lbtruncuv2dfv2di_us (__a);
++ return __builtin_aarch64_fmav2sf (-__b,
++ __aarch64_vdup_laneq_f32 (__c, __lane),
++ __a);
+ }
+
+-/* vcvta */
+-
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vcvtad_s64_f64 (float64_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_laneq_f64 (float64x1_t __a, float64x1_t __b,
++ float64x2_t __c, const int __lane)
+ {
+- return __builtin_aarch64_lrounddfdi (__a);
++ float64_t __c0 = __aarch64_vget_lane_any (__c, __lane);
++ return (float64x1_t) {__builtin_fma (-__b[0], __c0, __a[0])};
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcvtad_u64_f64 (float64_t __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsd_laneq_f64 (float64_t __a, float64_t __b,
++ float64x2_t __c, const int __lane)
+ {
+- return __builtin_aarch64_lroundudfdi_us (__a);
++ return __builtin_fma (-__b, __aarch64_vget_lane_any (__c, __lane), __a);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vcvtas_s32_f32 (float32_t __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmss_laneq_f32 (float32_t __a, float32_t __b,
++ float32x4_t __c, const int __lane)
+ {
+- return __builtin_aarch64_lroundsfsi (__a);
++ return __builtin_fmaf (-__b, __aarch64_vget_lane_any (__c, __lane), __a);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcvtas_u32_f32 (float32_t __a)
++/* vfmsq_lane */
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_lane_f32 (float32x4_t __a, float32x4_t __b,
++ float32x2_t __c, const int __lane)
+ {
+- return __builtin_aarch64_lroundusfsi_us (__a);
++ return __builtin_aarch64_fmav4sf (-__b,
++ __aarch64_vdupq_lane_f32 (__c, __lane),
++ __a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vcvta_s32_f32 (float32x2_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_lane_f64 (float64x2_t __a, float64x2_t __b,
++ float64x1_t __c, const int __lane)
+ {
+- return __builtin_aarch64_lroundv2sfv2si (__a);
++ return __builtin_aarch64_fmav2df (-__b, vdupq_n_f64 (__c[0]), __a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcvta_u32_f32 (float32x2_t __a)
++/* vfmsq_laneq */
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_laneq_f32 (float32x4_t __a, float32x4_t __b,
++ float32x4_t __c, const int __lane)
+ {
+- return __builtin_aarch64_lrounduv2sfv2si_us (__a);
++ return __builtin_aarch64_fmav4sf (-__b,
++ __aarch64_vdupq_laneq_f32 (__c, __lane),
++ __a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vcvtaq_s32_f32 (float32x4_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_laneq_f64 (float64x2_t __a, float64x2_t __b,
++ float64x2_t __c, const int __lane)
+ {
+- return __builtin_aarch64_lroundv4sfv4si (__a);
++ return __builtin_aarch64_fmav2df (-__b,
++ __aarch64_vdupq_laneq_f64 (__c, __lane),
++ __a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcvtaq_u32_f32 (float32x4_t __a)
++/* vld1 */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_f16 (const float16_t *__a)
+ {
+- return __builtin_aarch64_lrounduv4sfv4si_us (__a);
++ return __builtin_aarch64_ld1v4hf (__a);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vcvta_s64_f64 (float64x1_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_f32 (const float32_t *a)
+ {
+- return (int64x1_t) {vcvtad_s64_f64 (__a[0])};
++ return __builtin_aarch64_ld1v2sf ((const __builtin_aarch64_simd_sf *) a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcvta_u64_f64 (float64x1_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_f64 (const float64_t *a)
+ {
+- return (uint64x1_t) {vcvtad_u64_f64 (__a[0])};
++ return (float64x1_t) {*a};
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vcvtaq_s64_f64 (float64x2_t __a)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_p8 (const poly8_t *a)
+ {
+- return __builtin_aarch64_lroundv2dfv2di (__a);
++ return (poly8x8_t)
++ __builtin_aarch64_ld1v8qi ((const __builtin_aarch64_simd_qi *) a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcvtaq_u64_f64 (float64x2_t __a)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_p16 (const poly16_t *a)
+ {
+- return __builtin_aarch64_lrounduv2dfv2di_us (__a);
++ return (poly16x4_t)
++ __builtin_aarch64_ld1v4hi ((const __builtin_aarch64_simd_hi *) a);
+ }
+
+-/* vcvtm */
+-
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vcvtmd_s64_f64 (float64_t __a)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_s8 (const int8_t *a)
+ {
+- return __builtin_llfloor (__a);
++ return __builtin_aarch64_ld1v8qi ((const __builtin_aarch64_simd_qi *) a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcvtmd_u64_f64 (float64_t __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_s16 (const int16_t *a)
+ {
+- return __builtin_aarch64_lfloorudfdi_us (__a);
++ return __builtin_aarch64_ld1v4hi ((const __builtin_aarch64_simd_hi *) a);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vcvtms_s32_f32 (float32_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_s32 (const int32_t *a)
+ {
+- return __builtin_ifloorf (__a);
++ return __builtin_aarch64_ld1v2si ((const __builtin_aarch64_simd_si *) a);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcvtms_u32_f32 (float32_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_s64 (const int64_t *a)
+ {
+- return __builtin_aarch64_lfloorusfsi_us (__a);
++ return (int64x1_t) {*a};
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vcvtm_s32_f32 (float32x2_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_u8 (const uint8_t *a)
+ {
+- return __builtin_aarch64_lfloorv2sfv2si (__a);
++ return (uint8x8_t)
++ __builtin_aarch64_ld1v8qi ((const __builtin_aarch64_simd_qi *) a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcvtm_u32_f32 (float32x2_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_u16 (const uint16_t *a)
+ {
+- return __builtin_aarch64_lflooruv2sfv2si_us (__a);
++ return (uint16x4_t)
++ __builtin_aarch64_ld1v4hi ((const __builtin_aarch64_simd_hi *) a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vcvtmq_s32_f32 (float32x4_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_u32 (const uint32_t *a)
+ {
+- return __builtin_aarch64_lfloorv4sfv4si (__a);
++ return (uint32x2_t)
++ __builtin_aarch64_ld1v2si ((const __builtin_aarch64_simd_si *) a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcvtmq_u32_f32 (float32x4_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_u64 (const uint64_t *a)
+ {
+- return __builtin_aarch64_lflooruv4sfv4si_us (__a);
++ return (uint64x1_t) {*a};
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vcvtm_s64_f64 (float64x1_t __a)
++/* vld1q */
++
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_f16 (const float16_t *__a)
+ {
+- return (int64x1_t) {vcvtmd_s64_f64 (__a[0])};
++ return __builtin_aarch64_ld1v8hf (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcvtm_u64_f64 (float64x1_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_f32 (const float32_t *a)
+ {
+- return (uint64x1_t) {vcvtmd_u64_f64 (__a[0])};
++ return __builtin_aarch64_ld1v4sf ((const __builtin_aarch64_simd_sf *) a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vcvtmq_s64_f64 (float64x2_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_f64 (const float64_t *a)
+ {
+- return __builtin_aarch64_lfloorv2dfv2di (__a);
++ return __builtin_aarch64_ld1v2df ((const __builtin_aarch64_simd_df *) a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcvtmq_u64_f64 (float64x2_t __a)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_p8 (const poly8_t *a)
+ {
+- return __builtin_aarch64_lflooruv2dfv2di_us (__a);
++ return (poly8x16_t)
++ __builtin_aarch64_ld1v16qi ((const __builtin_aarch64_simd_qi *) a);
+ }
+
+-/* vcvtn */
+-
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vcvtnd_s64_f64 (float64_t __a)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_p16 (const poly16_t *a)
+ {
+- return __builtin_aarch64_lfrintndfdi (__a);
++ return (poly16x8_t)
++ __builtin_aarch64_ld1v8hi ((const __builtin_aarch64_simd_hi *) a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcvtnd_u64_f64 (float64_t __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_s8 (const int8_t *a)
+ {
+- return __builtin_aarch64_lfrintnudfdi_us (__a);
++ return __builtin_aarch64_ld1v16qi ((const __builtin_aarch64_simd_qi *) a);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vcvtns_s32_f32 (float32_t __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_s16 (const int16_t *a)
+ {
+- return __builtin_aarch64_lfrintnsfsi (__a);
++ return __builtin_aarch64_ld1v8hi ((const __builtin_aarch64_simd_hi *) a);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcvtns_u32_f32 (float32_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_s32 (const int32_t *a)
+ {
+- return __builtin_aarch64_lfrintnusfsi_us (__a);
++ return __builtin_aarch64_ld1v4si ((const __builtin_aarch64_simd_si *) a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vcvtn_s32_f32 (float32x2_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_s64 (const int64_t *a)
+ {
+- return __builtin_aarch64_lfrintnv2sfv2si (__a);
++ return __builtin_aarch64_ld1v2di ((const __builtin_aarch64_simd_di *) a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcvtn_u32_f32 (float32x2_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_u8 (const uint8_t *a)
+ {
+- return __builtin_aarch64_lfrintnuv2sfv2si_us (__a);
++ return (uint8x16_t)
++ __builtin_aarch64_ld1v16qi ((const __builtin_aarch64_simd_qi *) a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vcvtnq_s32_f32 (float32x4_t __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_u16 (const uint16_t *a)
+ {
+- return __builtin_aarch64_lfrintnv4sfv4si (__a);
++ return (uint16x8_t)
++ __builtin_aarch64_ld1v8hi ((const __builtin_aarch64_simd_hi *) a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcvtnq_u32_f32 (float32x4_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_u32 (const uint32_t *a)
+ {
+- return __builtin_aarch64_lfrintnuv4sfv4si_us (__a);
++ return (uint32x4_t)
++ __builtin_aarch64_ld1v4si ((const __builtin_aarch64_simd_si *) a);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vcvtn_s64_f64 (float64x1_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_u64 (const uint64_t *a)
+ {
+- return (int64x1_t) {vcvtnd_s64_f64 (__a[0])};
++ return (uint64x2_t)
++ __builtin_aarch64_ld1v2di ((const __builtin_aarch64_simd_di *) a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcvtn_u64_f64 (float64x1_t __a)
++/* vld1_dup */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_f16 (const float16_t* __a)
+ {
+- return (uint64x1_t) {vcvtnd_u64_f64 (__a[0])};
++ return vdup_n_f16 (*__a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vcvtnq_s64_f64 (float64x2_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_f32 (const float32_t* __a)
+ {
+- return __builtin_aarch64_lfrintnv2dfv2di (__a);
++ return vdup_n_f32 (*__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcvtnq_u64_f64 (float64x2_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_f64 (const float64_t* __a)
+ {
+- return __builtin_aarch64_lfrintnuv2dfv2di_us (__a);
++ return vdup_n_f64 (*__a);
+ }
+
+-/* vcvtp */
+-
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vcvtpd_s64_f64 (float64_t __a)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_p8 (const poly8_t* __a)
+ {
+- return __builtin_llceil (__a);
++ return vdup_n_p8 (*__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vcvtpd_u64_f64 (float64_t __a)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_p16 (const poly16_t* __a)
+ {
+- return __builtin_aarch64_lceiludfdi_us (__a);
++ return vdup_n_p16 (*__a);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vcvtps_s32_f32 (float32_t __a)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_s8 (const int8_t* __a)
+ {
+- return __builtin_iceilf (__a);
++ return vdup_n_s8 (*__a);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vcvtps_u32_f32 (float32_t __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_s16 (const int16_t* __a)
+ {
+- return __builtin_aarch64_lceilusfsi_us (__a);
++ return vdup_n_s16 (*__a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vcvtp_s32_f32 (float32x2_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_s32 (const int32_t* __a)
+ {
+- return __builtin_aarch64_lceilv2sfv2si (__a);
++ return vdup_n_s32 (*__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vcvtp_u32_f32 (float32x2_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_s64 (const int64_t* __a)
+ {
+- return __builtin_aarch64_lceiluv2sfv2si_us (__a);
++ return vdup_n_s64 (*__a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vcvtpq_s32_f32 (float32x4_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_u8 (const uint8_t* __a)
+ {
+- return __builtin_aarch64_lceilv4sfv4si (__a);
++ return vdup_n_u8 (*__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vcvtpq_u32_f32 (float32x4_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_u16 (const uint16_t* __a)
+ {
+- return __builtin_aarch64_lceiluv4sfv4si_us (__a);
++ return vdup_n_u16 (*__a);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vcvtp_s64_f64 (float64x1_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_u32 (const uint32_t* __a)
+ {
+- return (int64x1_t) {vcvtpd_s64_f64 (__a[0])};
++ return vdup_n_u32 (*__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vcvtp_u64_f64 (float64x1_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_dup_u64 (const uint64_t* __a)
+ {
+- return (uint64x1_t) {vcvtpd_u64_f64 (__a[0])};
++ return vdup_n_u64 (*__a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vcvtpq_s64_f64 (float64x2_t __a)
++/* vld1q_dup */
++
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_f16 (const float16_t* __a)
+ {
+- return __builtin_aarch64_lceilv2dfv2di (__a);
++ return vdupq_n_f16 (*__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vcvtpq_u64_f64 (float64x2_t __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_f32 (const float32_t* __a)
+ {
+- return __builtin_aarch64_lceiluv2dfv2di_us (__a);
++ return vdupq_n_f32 (*__a);
+ }
+
+-/* vdup_n */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vdup_n_f32 (float32_t __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_f64 (const float64_t* __a)
+ {
+- return (float32x2_t) {__a, __a};
++ return vdupq_n_f64 (*__a);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vdup_n_f64 (float64_t __a)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_p8 (const poly8_t* __a)
+ {
+- return (float64x1_t) {__a};
++ return vdupq_n_p8 (*__a);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vdup_n_p8 (poly8_t __a)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_p16 (const poly16_t* __a)
+ {
+- return (poly8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
++ return vdupq_n_p16 (*__a);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vdup_n_p16 (poly16_t __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_s8 (const int8_t* __a)
+ {
+- return (poly16x4_t) {__a, __a, __a, __a};
++ return vdupq_n_s8 (*__a);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vdup_n_s8 (int8_t __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_s16 (const int16_t* __a)
+ {
+- return (int8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
++ return vdupq_n_s16 (*__a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vdup_n_s16 (int16_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_s32 (const int32_t* __a)
+ {
+- return (int16x4_t) {__a, __a, __a, __a};
++ return vdupq_n_s32 (*__a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vdup_n_s32 (int32_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_s64 (const int64_t* __a)
+ {
+- return (int32x2_t) {__a, __a};
++ return vdupq_n_s64 (*__a);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vdup_n_s64 (int64_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_u8 (const uint8_t* __a)
+ {
+- return (int64x1_t) {__a};
++ return vdupq_n_u8 (*__a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vdup_n_u8 (uint8_t __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_u16 (const uint16_t* __a)
+ {
+- return (uint8x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
++ return vdupq_n_u16 (*__a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vdup_n_u16 (uint16_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_u32 (const uint32_t* __a)
+ {
+- return (uint16x4_t) {__a, __a, __a, __a};
++ return vdupq_n_u32 (*__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vdup_n_u32 (uint32_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_dup_u64 (const uint64_t* __a)
+ {
+- return (uint32x2_t) {__a, __a};
++ return vdupq_n_u64 (*__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vdup_n_u64 (uint64_t __a)
++/* vld1_lane */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_f16 (const float16_t *__src, float16x4_t __vec, const int __lane)
+ {
+- return (uint64x1_t) {__a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-/* vdupq_n */
+-
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vdupq_n_f32 (float32_t __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_f32 (const float32_t *__src, float32x2_t __vec, const int __lane)
+ {
+- return (float32x4_t) {__a, __a, __a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vdupq_n_f64 (float64_t __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_f64 (const float64_t *__src, float64x1_t __vec, const int __lane)
+ {
+- return (float64x2_t) {__a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vdupq_n_p8 (uint32_t __a)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_p8 (const poly8_t *__src, poly8x8_t __vec, const int __lane)
+ {
+- return (poly8x16_t) {__a, __a, __a, __a, __a, __a, __a, __a,
+- __a, __a, __a, __a, __a, __a, __a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vdupq_n_p16 (uint32_t __a)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_p16 (const poly16_t *__src, poly16x4_t __vec, const int __lane)
+ {
+- return (poly16x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vdupq_n_s8 (int32_t __a)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_s8 (const int8_t *__src, int8x8_t __vec, const int __lane)
+ {
+- return (int8x16_t) {__a, __a, __a, __a, __a, __a, __a, __a,
+- __a, __a, __a, __a, __a, __a, __a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vdupq_n_s16 (int32_t __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_s16 (const int16_t *__src, int16x4_t __vec, const int __lane)
+ {
+- return (int16x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vdupq_n_s32 (int32_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_s32 (const int32_t *__src, int32x2_t __vec, const int __lane)
+ {
+- return (int32x4_t) {__a, __a, __a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vdupq_n_s64 (int64_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_s64 (const int64_t *__src, int64x1_t __vec, const int __lane)
+ {
+- return (int64x2_t) {__a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vdupq_n_u8 (uint32_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_u8 (const uint8_t *__src, uint8x8_t __vec, const int __lane)
+ {
+- return (uint8x16_t) {__a, __a, __a, __a, __a, __a, __a, __a,
+- __a, __a, __a, __a, __a, __a, __a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vdupq_n_u16 (uint32_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_u16 (const uint16_t *__src, uint16x4_t __vec, const int __lane)
+ {
+- return (uint16x8_t) {__a, __a, __a, __a, __a, __a, __a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vdupq_n_u32 (uint32_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_u32 (const uint32_t *__src, uint32x2_t __vec, const int __lane)
+ {
+- return (uint32x4_t) {__a, __a, __a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vdupq_n_u64 (uint64_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
+ {
+- return (uint64x2_t) {__a, __a};
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-/* vdup_lane */
++/* vld1q_lane */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vdup_lane_f32 (float32x2_t __a, const int __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_f16 (const float16_t *__src, float16x8_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_f32 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vdup_lane_f64 (float64x1_t __a, const int __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_f32 (const float32_t *__src, float32x4_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_f64 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vdup_lane_p8 (poly8x8_t __a, const int __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_f64 (const float64_t *__src, float64x2_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_p8 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vdup_lane_p16 (poly16x4_t __a, const int __b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_p8 (const poly8_t *__src, poly8x16_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_p16 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vdup_lane_s8 (int8x8_t __a, const int __b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_p16 (const poly16_t *__src, poly16x8_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_s8 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vdup_lane_s16 (int16x4_t __a, const int __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_s8 (const int8_t *__src, int8x16_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_s16 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vdup_lane_s32 (int32x2_t __a, const int __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_s16 (const int16_t *__src, int16x8_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_s32 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vdup_lane_s64 (int64x1_t __a, const int __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_s32 (const int32_t *__src, int32x4_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_s64 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vdup_lane_u8 (uint8x8_t __a, const int __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_s64 (const int64_t *__src, int64x2_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_u8 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vdup_lane_u16 (uint16x4_t __a, const int __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_u8 (const uint8_t *__src, uint8x16_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_u16 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vdup_lane_u32 (uint32x2_t __a, const int __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_u16 (const uint16_t *__src, uint16x8_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_u32 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vdup_lane_u64 (uint64x1_t __a, const int __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_u32 (const uint32_t *__src, uint32x4_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_lane_u64 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-/* vdup_laneq */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vdup_laneq_f32 (float32x4_t __a, const int __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld1q_lane_u64 (const uint64_t *__src, uint64x2_t __vec, const int __lane)
+ {
+- return __aarch64_vdup_laneq_f32 (__a, __b);
++ return __aarch64_vset_lane_any (*__src, __vec, __lane);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vdup_laneq_f64 (float64x2_t __a, const int __b)
++/* vldn */
++
++__extension__ extern __inline int64x1x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_s64 (const int64_t * __a)
+ {
+- return __aarch64_vdup_laneq_f64 (__a, __b);
++ int64x1x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 0);
++ ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vdup_laneq_p8 (poly8x16_t __a, const int __b)
++__extension__ extern __inline uint64x1x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_u64 (const uint64_t * __a)
+ {
+- return __aarch64_vdup_laneq_p8 (__a, __b);
++ uint64x1x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 0);
++ ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vdup_laneq_p16 (poly16x8_t __a, const int __b)
++__extension__ extern __inline float64x1x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_f64 (const float64_t * __a)
+ {
+- return __aarch64_vdup_laneq_p16 (__a, __b);
++ float64x1x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2df ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 0)};
++ ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 1)};
++ return ret;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vdup_laneq_s8 (int8x16_t __a, const int __b)
++__extension__ extern __inline int8x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_s8 (const int8_t * __a)
+ {
+- return __aarch64_vdup_laneq_s8 (__a, __b);
++ int8x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
++ ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vdup_laneq_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline poly8x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_p8 (const poly8_t * __a)
+ {
+- return __aarch64_vdup_laneq_s16 (__a, __b);
++ poly8x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
++ ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vdup_laneq_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline int16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_s16 (const int16_t * __a)
+ {
+- return __aarch64_vdup_laneq_s32 (__a, __b);
++ int16x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
++ ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vdup_laneq_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline poly16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_p16 (const poly16_t * __a)
+ {
+- return __aarch64_vdup_laneq_s64 (__a, __b);
++ poly16x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
++ ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vdup_laneq_u8 (uint8x16_t __a, const int __b)
++__extension__ extern __inline int32x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_s32 (const int32_t * __a)
+ {
+- return __aarch64_vdup_laneq_u8 (__a, __b);
++ int32x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0);
++ ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vdup_laneq_u16 (uint16x8_t __a, const int __b)
++__extension__ extern __inline uint8x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_u8 (const uint8_t * __a)
+ {
+- return __aarch64_vdup_laneq_u16 (__a, __b);
++ uint8x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
++ ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vdup_laneq_u32 (uint32x4_t __a, const int __b)
++__extension__ extern __inline uint16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_u16 (const uint16_t * __a)
+ {
+- return __aarch64_vdup_laneq_u32 (__a, __b);
++ uint16x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
++ ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vdup_laneq_u64 (uint64x2_t __a, const int __b)
++__extension__ extern __inline uint32x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_u32 (const uint32_t * __a)
+ {
+- return __aarch64_vdup_laneq_u64 (__a, __b);
++ uint32x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0);
++ ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1);
++ return ret;
+ }
+
+-/* vdupq_lane */
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vdupq_lane_f32 (float32x2_t __a, const int __b)
++__extension__ extern __inline float16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_f16 (const float16_t * __a)
+ {
+- return __aarch64_vdupq_lane_f32 (__a, __b);
++ float16x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v4hf (__a);
++ ret.val[0] = __builtin_aarch64_get_dregoiv4hf (__o, 0);
++ ret.val[1] = __builtin_aarch64_get_dregoiv4hf (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vdupq_lane_f64 (float64x1_t __a, const int __b)
++__extension__ extern __inline float32x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_f32 (const float32_t * __a)
+ {
+- return __aarch64_vdupq_lane_f64 (__a, __b);
++ float32x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v2sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 0);
++ ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vdupq_lane_p8 (poly8x8_t __a, const int __b)
++__extension__ extern __inline int8x16x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_s8 (const int8_t * __a)
+ {
+- return __aarch64_vdupq_lane_p8 (__a, __b);
++ int8x16x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
++ ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vdupq_lane_p16 (poly16x4_t __a, const int __b)
++__extension__ extern __inline poly8x16x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_p8 (const poly8_t * __a)
+ {
+- return __aarch64_vdupq_lane_p16 (__a, __b);
++ poly8x16x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
++ ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vdupq_lane_s8 (int8x8_t __a, const int __b)
++__extension__ extern __inline int16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_s16 (const int16_t * __a)
+ {
+- return __aarch64_vdupq_lane_s8 (__a, __b);
++ int16x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
++ ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vdupq_lane_s16 (int16x4_t __a, const int __b)
++__extension__ extern __inline poly16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_p16 (const poly16_t * __a)
+ {
+- return __aarch64_vdupq_lane_s16 (__a, __b);
++ poly16x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
++ ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vdupq_lane_s32 (int32x2_t __a, const int __b)
++__extension__ extern __inline int32x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_s32 (const int32_t * __a)
+ {
+- return __aarch64_vdupq_lane_s32 (__a, __b);
++ int32x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0);
++ ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vdupq_lane_s64 (int64x1_t __a, const int __b)
++__extension__ extern __inline int64x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_s64 (const int64_t * __a)
+ {
+- return __aarch64_vdupq_lane_s64 (__a, __b);
++ int64x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0);
++ ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vdupq_lane_u8 (uint8x8_t __a, const int __b)
++__extension__ extern __inline uint8x16x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_u8 (const uint8_t * __a)
+ {
+- return __aarch64_vdupq_lane_u8 (__a, __b);
++ uint8x16x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
++ ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vdupq_lane_u16 (uint16x4_t __a, const int __b)
++__extension__ extern __inline uint16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_u16 (const uint16_t * __a)
+ {
+- return __aarch64_vdupq_lane_u16 (__a, __b);
++ uint16x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
++ ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vdupq_lane_u32 (uint32x2_t __a, const int __b)
++__extension__ extern __inline uint32x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_u32 (const uint32_t * __a)
+ {
+- return __aarch64_vdupq_lane_u32 (__a, __b);
++ uint32x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0);
++ ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vdupq_lane_u64 (uint64x1_t __a, const int __b)
++__extension__ extern __inline uint64x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_u64 (const uint64_t * __a)
+ {
+- return __aarch64_vdupq_lane_u64 (__a, __b);
++ uint64x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0);
++ ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1);
++ return ret;
+ }
+
+-/* vdupq_laneq */
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vdupq_laneq_f32 (float32x4_t __a, const int __b)
++__extension__ extern __inline float16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_f16 (const float16_t * __a)
+ {
+- return __aarch64_vdupq_laneq_f32 (__a, __b);
++ float16x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v8hf (__a);
++ ret.val[0] = __builtin_aarch64_get_qregoiv8hf (__o, 0);
++ ret.val[1] = __builtin_aarch64_get_qregoiv8hf (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vdupq_laneq_f64 (float64x2_t __a, const int __b)
++__extension__ extern __inline float32x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_f32 (const float32_t * __a)
+ {
+- return __aarch64_vdupq_laneq_f64 (__a, __b);
++ float32x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v4sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 0);
++ ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vdupq_laneq_p8 (poly8x16_t __a, const int __b)
++__extension__ extern __inline float64x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_f64 (const float64_t * __a)
+ {
+- return __aarch64_vdupq_laneq_p8 (__a, __b);
++ float64x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2v2df ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 0);
++ ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vdupq_laneq_p16 (poly16x8_t __a, const int __b)
++__extension__ extern __inline int64x1x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_s64 (const int64_t * __a)
+ {
+- return __aarch64_vdupq_laneq_p16 (__a, __b);
++ int64x1x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 0);
++ ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 1);
++ ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vdupq_laneq_s8 (int8x16_t __a, const int __b)
++__extension__ extern __inline uint64x1x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_u64 (const uint64_t * __a)
+ {
+- return __aarch64_vdupq_laneq_s8 (__a, __b);
++ uint64x1x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 0);
++ ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 1);
++ ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vdupq_laneq_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline float64x1x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_f64 (const float64_t * __a)
+ {
+- return __aarch64_vdupq_laneq_s16 (__a, __b);
++ float64x1x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3df ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 0)};
++ ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 1)};
++ ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 2)};
++ return ret;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vdupq_laneq_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline int8x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_s8 (const int8_t * __a)
+ {
+- return __aarch64_vdupq_laneq_s32 (__a, __b);
++ int8x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
++ ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
++ ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vdupq_laneq_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline poly8x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_p8 (const poly8_t * __a)
+ {
+- return __aarch64_vdupq_laneq_s64 (__a, __b);
++ poly8x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
++ ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
++ ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vdupq_laneq_u8 (uint8x16_t __a, const int __b)
++__extension__ extern __inline int16x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_s16 (const int16_t * __a)
+ {
+- return __aarch64_vdupq_laneq_u8 (__a, __b);
++ int16x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
++ ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
++ ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vdupq_laneq_u16 (uint16x8_t __a, const int __b)
++__extension__ extern __inline poly16x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_p16 (const poly16_t * __a)
+ {
+- return __aarch64_vdupq_laneq_u16 (__a, __b);
++ poly16x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
++ ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
++ ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vdupq_laneq_u32 (uint32x4_t __a, const int __b)
++__extension__ extern __inline int32x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_s32 (const int32_t * __a)
+ {
+- return __aarch64_vdupq_laneq_u32 (__a, __b);
++ int32x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0);
++ ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1);
++ ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vdupq_laneq_u64 (uint64x2_t __a, const int __b)
++__extension__ extern __inline uint8x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_u8 (const uint8_t * __a)
+ {
+- return __aarch64_vdupq_laneq_u64 (__a, __b);
++ uint8x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
++ ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
++ ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
++ return ret;
+ }
+
+-/* vdupb_lane */
+-__extension__ static __inline poly8_t __attribute__ ((__always_inline__))
+-vdupb_lane_p8 (poly8x8_t __a, const int __b)
++__extension__ extern __inline uint16x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_u16 (const uint16_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ uint16x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
++ ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
++ ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vdupb_lane_s8 (int8x8_t __a, const int __b)
++__extension__ extern __inline uint32x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_u32 (const uint32_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ uint32x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0);
++ ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1);
++ ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vdupb_lane_u8 (uint8x8_t __a, const int __b)
++__extension__ extern __inline float16x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_f16 (const float16_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ float16x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v4hf (__a);
++ ret.val[0] = __builtin_aarch64_get_dregciv4hf (__o, 0);
++ ret.val[1] = __builtin_aarch64_get_dregciv4hf (__o, 1);
++ ret.val[2] = __builtin_aarch64_get_dregciv4hf (__o, 2);
++ return ret;
+ }
+
+-/* vduph_lane */
+-__extension__ static __inline poly16_t __attribute__ ((__always_inline__))
+-vduph_lane_p16 (poly16x4_t __a, const int __b)
++__extension__ extern __inline float32x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_f32 (const float32_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ float32x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v2sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 0);
++ ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 1);
++ ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vduph_lane_s16 (int16x4_t __a, const int __b)
++__extension__ extern __inline int8x16x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_s8 (const int8_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ int8x16x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
++ ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
++ ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vduph_lane_u16 (uint16x4_t __a, const int __b)
++__extension__ extern __inline poly8x16x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_p8 (const poly8_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ poly8x16x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
++ ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
++ ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
++ return ret;
+ }
+
+-/* vdups_lane */
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vdups_lane_f32 (float32x2_t __a, const int __b)
++__extension__ extern __inline int16x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_s16 (const int16_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ int16x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
++ ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
++ ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vdups_lane_s32 (int32x2_t __a, const int __b)
++__extension__ extern __inline poly16x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_p16 (const poly16_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ poly16x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
++ ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
++ ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vdups_lane_u32 (uint32x2_t __a, const int __b)
++__extension__ extern __inline int32x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_s32 (const int32_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ int32x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0);
++ ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1);
++ ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2);
++ return ret;
+ }
+
+-/* vdupd_lane */
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vdupd_lane_f64 (float64x1_t __a, const int __b)
++__extension__ extern __inline int64x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_s64 (const int64_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __b);
+- return __a[0];
++ int64x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0);
++ ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1);
++ ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vdupd_lane_s64 (int64x1_t __a, const int __b)
++__extension__ extern __inline uint8x16x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_u8 (const uint8_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __b);
+- return __a[0];
++ uint8x16x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
++ ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
++ ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vdupd_lane_u64 (uint64x1_t __a, const int __b)
++__extension__ extern __inline uint16x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_u16 (const uint16_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __b);
+- return __a[0];
++ uint16x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
++ ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
++ ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
++ return ret;
+ }
+
+-/* vdupb_laneq */
+-__extension__ static __inline poly8_t __attribute__ ((__always_inline__))
+-vdupb_laneq_p8 (poly8x16_t __a, const int __b)
++__extension__ extern __inline uint32x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_u32 (const uint32_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ uint32x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0);
++ ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1);
++ ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vdupb_laneq_s8 (int8x16_t __a, const int __attribute__ ((unused)) __b)
++__extension__ extern __inline uint64x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_u64 (const uint64_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ uint64x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0);
++ ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1);
++ ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vdupb_laneq_u8 (uint8x16_t __a, const int __b)
++__extension__ extern __inline float16x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_f16 (const float16_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ float16x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v8hf (__a);
++ ret.val[0] = __builtin_aarch64_get_qregciv8hf (__o, 0);
++ ret.val[1] = __builtin_aarch64_get_qregciv8hf (__o, 1);
++ ret.val[2] = __builtin_aarch64_get_qregciv8hf (__o, 2);
++ return ret;
+ }
+
+-/* vduph_laneq */
+-__extension__ static __inline poly16_t __attribute__ ((__always_inline__))
+-vduph_laneq_p16 (poly16x8_t __a, const int __b)
++__extension__ extern __inline float32x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_f32 (const float32_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ float32x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v4sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 0);
++ ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 1);
++ ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vduph_laneq_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline float64x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_f64 (const float64_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ float64x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3v2df ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 0);
++ ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 1);
++ ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vduph_laneq_u16 (uint16x8_t __a, const int __b)
++__extension__ extern __inline int64x1x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_s64 (const int64_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ int64x1x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 0);
++ ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 1);
++ ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 2);
++ ret.val[3] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 3);
++ return ret;
+ }
+
+-/* vdups_laneq */
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vdups_laneq_f32 (float32x4_t __a, const int __b)
++__extension__ extern __inline uint64x1x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_u64 (const uint64_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ uint64x1x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 0);
++ ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 1);
++ ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 2);
++ ret.val[3] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vdups_laneq_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline float64x1x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_f64 (const float64_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ float64x1x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4df ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 0)};
++ ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 1)};
++ ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 2)};
++ ret.val[3] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 3)};
++ return ret;
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vdups_laneq_u32 (uint32x4_t __a, const int __b)
++__extension__ extern __inline int8x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_s8 (const int8_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ int8x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
++ ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
++ ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
++ ret.val[3] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
++ return ret;
+ }
+
+-/* vdupd_laneq */
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vdupd_laneq_f64 (float64x2_t __a, const int __b)
++__extension__ extern __inline poly8x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_p8 (const poly8_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ poly8x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
++ ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
++ ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
++ ret.val[3] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vdupd_laneq_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline int16x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_s16 (const int16_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ int16x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
++ ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
++ ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
++ ret.val[3] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vdupd_laneq_u64 (uint64x2_t __a, const int __b)
++__extension__ extern __inline poly16x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_p16 (const poly16_t * __a)
+ {
+- return __aarch64_vget_lane_any (__a, __b);
++ poly16x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
++ ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
++ ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
++ ret.val[3] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
++ return ret;
+ }
+
+-/* vext */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vext_f32 (float32x2_t __a, float32x2_t __b, __const int __c)
++__extension__ extern __inline int32x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_s32 (const int32_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint32x2_t) {2-__c, 3-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {__c, __c+1});
+-#endif
++ int32x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0);
++ ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1);
++ ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2);
++ ret.val[3] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vext_f64 (float64x1_t __a, float64x1_t __b, __const int __c)
+-{
+- __AARCH64_LANE_CHECK (__a, __c);
+- /* The only possible index to the assembler instruction returns element 0. */
+- return __a;
+-}
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vext_p8 (poly8x8_t __a, poly8x8_t __b, __const int __c)
++__extension__ extern __inline uint8x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_u8 (const uint8_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint8x8_t)
+- {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint8x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
+-#endif
++ uint8x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
++ ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
++ ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
++ ret.val[3] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vext_p16 (poly16x4_t __a, poly16x4_t __b, __const int __c)
++__extension__ extern __inline uint16x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_u16 (const uint16_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a,
+- (uint16x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {__c, __c+1, __c+2, __c+3});
+-#endif
++ uint16x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
++ ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
++ ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
++ ret.val[3] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vext_s8 (int8x8_t __a, int8x8_t __b, __const int __c)
++__extension__ extern __inline uint32x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_u32 (const uint32_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint8x8_t)
+- {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint8x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
+-#endif
++ uint32x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0);
++ ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1);
++ ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2);
++ ret.val[3] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vext_s16 (int16x4_t __a, int16x4_t __b, __const int __c)
++__extension__ extern __inline float16x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_f16 (const float16_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a,
+- (uint16x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {__c, __c+1, __c+2, __c+3});
+-#endif
++ float16x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v4hf (__a);
++ ret.val[0] = __builtin_aarch64_get_dregxiv4hf (__o, 0);
++ ret.val[1] = __builtin_aarch64_get_dregxiv4hf (__o, 1);
++ ret.val[2] = __builtin_aarch64_get_dregxiv4hf (__o, 2);
++ ret.val[3] = __builtin_aarch64_get_dregxiv4hf (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vext_s32 (int32x2_t __a, int32x2_t __b, __const int __c)
++__extension__ extern __inline float32x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_f32 (const float32_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint32x2_t) {2-__c, 3-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {__c, __c+1});
+-#endif
++ float32x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v2sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 0);
++ ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 1);
++ ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 2);
++ ret.val[3] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vext_s64 (int64x1_t __a, int64x1_t __b, __const int __c)
++__extension__ extern __inline int8x16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_s8 (const int8_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+- /* The only possible index to the assembler instruction returns element 0. */
+- return __a;
++ int8x16x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
++ ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
++ ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
++ ret.val[3] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vext_u8 (uint8x8_t __a, uint8x8_t __b, __const int __c)
++__extension__ extern __inline poly8x16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_p8 (const poly8_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint8x8_t)
+- {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint8x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
+-#endif
++ poly8x16x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
++ ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
++ ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
++ ret.val[3] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vext_u16 (uint16x4_t __a, uint16x4_t __b, __const int __c)
++__extension__ extern __inline int16x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_s16 (const int16_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a,
+- (uint16x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {__c, __c+1, __c+2, __c+3});
+-#endif
++ int16x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
++ ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
++ ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
++ ret.val[3] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vext_u32 (uint32x2_t __a, uint32x2_t __b, __const int __c)
++__extension__ extern __inline poly16x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_p16 (const poly16_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint32x2_t) {2-__c, 3-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {__c, __c+1});
+-#endif
++ poly16x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
++ ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
++ ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
++ ret.val[3] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)
++__extension__ extern __inline int32x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_s32 (const int32_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+- /* The only possible index to the assembler instruction returns element 0. */
+- return __a;
++ int32x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0);
++ ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1);
++ ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2);
++ ret.val[3] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vextq_f32 (float32x4_t __a, float32x4_t __b, __const int __c)
++__extension__ extern __inline int64x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_s64 (const int64_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a,
+- (uint32x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {__c, __c+1, __c+2, __c+3});
+-#endif
++ int64x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0);
++ ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1);
++ ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2);
++ ret.val[3] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vextq_f64 (float64x2_t __a, float64x2_t __b, __const int __c)
++__extension__ extern __inline uint8x16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_u8 (const uint8_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint64x2_t) {2-__c, 3-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {__c, __c+1});
+-#endif
++ uint8x16x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
++ ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
++ ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
++ ret.val[3] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vextq_p8 (poly8x16_t __a, poly8x16_t __b, __const int __c)
++__extension__ extern __inline uint16x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_u16 (const uint16_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint8x16_t)
+- {16-__c, 17-__c, 18-__c, 19-__c, 20-__c, 21-__c, 22-__c, 23-__c,
+- 24-__c, 25-__c, 26-__c, 27-__c, 28-__c, 29-__c, 30-__c, 31-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7,
+- __c+8, __c+9, __c+10, __c+11, __c+12, __c+13, __c+14, __c+15});
+-#endif
++ uint16x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
++ ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
++ ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
++ ret.val[3] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vextq_p16 (poly16x8_t __a, poly16x8_t __b, __const int __c)
++__extension__ extern __inline uint32x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_u32 (const uint32_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint16x8_t)
+- {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint16x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
+-#endif
++ uint32x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0);
++ ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1);
++ ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2);
++ ret.val[3] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vextq_s8 (int8x16_t __a, int8x16_t __b, __const int __c)
++__extension__ extern __inline uint64x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_u64 (const uint64_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint8x16_t)
+- {16-__c, 17-__c, 18-__c, 19-__c, 20-__c, 21-__c, 22-__c, 23-__c,
+- 24-__c, 25-__c, 26-__c, 27-__c, 28-__c, 29-__c, 30-__c, 31-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7,
+- __c+8, __c+9, __c+10, __c+11, __c+12, __c+13, __c+14, __c+15});
+-#endif
++ uint64x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0);
++ ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1);
++ ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2);
++ ret.val[3] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vextq_s16 (int16x8_t __a, int16x8_t __b, __const int __c)
++__extension__ extern __inline float16x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_f16 (const float16_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint16x8_t)
+- {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint16x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
+-#endif
++ float16x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v8hf (__a);
++ ret.val[0] = __builtin_aarch64_get_qregxiv8hf (__o, 0);
++ ret.val[1] = __builtin_aarch64_get_qregxiv8hf (__o, 1);
++ ret.val[2] = __builtin_aarch64_get_qregxiv8hf (__o, 2);
++ ret.val[3] = __builtin_aarch64_get_qregxiv8hf (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vextq_s32 (int32x4_t __a, int32x4_t __b, __const int __c)
++__extension__ extern __inline float32x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_f32 (const float32_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a,
+- (uint32x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {__c, __c+1, __c+2, __c+3});
+-#endif
++ float32x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v4sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 0);
++ ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 1);
++ ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 2);
++ ret.val[3] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vextq_s64 (int64x2_t __a, int64x2_t __b, __const int __c)
++__extension__ extern __inline float64x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_f64 (const float64_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint64x2_t) {2-__c, 3-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {__c, __c+1});
+-#endif
++ float64x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4v2df ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 0);
++ ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 1);
++ ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 2);
++ ret.val[3] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vextq_u8 (uint8x16_t __a, uint8x16_t __b, __const int __c)
++/* vldn_dup */
++
++__extension__ extern __inline int8x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_s8 (const int8_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint8x16_t)
+- {16-__c, 17-__c, 18-__c, 19-__c, 20-__c, 21-__c, 22-__c, 23-__c,
+- 24-__c, 25-__c, 26-__c, 27-__c, 28-__c, 29-__c, 30-__c, 31-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7,
+- __c+8, __c+9, __c+10, __c+11, __c+12, __c+13, __c+14, __c+15});
+-#endif
++ int8x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
++ ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vextq_u16 (uint16x8_t __a, uint16x8_t __b, __const int __c)
++__extension__ extern __inline int16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_s16 (const int16_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint16x8_t)
+- {8-__c, 9-__c, 10-__c, 11-__c, 12-__c, 13-__c, 14-__c, 15-__c});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint16x8_t) {__c, __c+1, __c+2, __c+3, __c+4, __c+5, __c+6, __c+7});
+-#endif
++ int16x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
++ ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vextq_u32 (uint32x4_t __a, uint32x4_t __b, __const int __c)
++__extension__ extern __inline int32x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_s32 (const int32_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a,
+- (uint32x4_t) {4-__c, 5-__c, 6-__c, 7-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {__c, __c+1, __c+2, __c+3});
+-#endif
++ int32x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0);
++ ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vextq_u64 (uint64x2_t __a, uint64x2_t __b, __const int __c)
++__extension__ extern __inline float16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_f16 (const float16_t * __a)
+ {
+- __AARCH64_LANE_CHECK (__a, __c);
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__b, __a, (uint64x2_t) {2-__c, 3-__c});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {__c, __c+1});
+-#endif
++ float16x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv4hf ((const __builtin_aarch64_simd_hf *) __a);
++ ret.val[0] = __builtin_aarch64_get_dregoiv4hf (__o, 0);
++ ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregoiv4hf (__o, 1);
++ return ret;
+ }
+
+-/* vfma */
+-
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vfma_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c)
++__extension__ extern __inline float32x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_f32 (const float32_t * __a)
+ {
+- return (float64x1_t) {__builtin_fma (__b[0], __c[0], __a[0])};
++ float32x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv2sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 0);
++ ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vfma_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
++__extension__ extern __inline float64x1x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_f64 (const float64_t * __a)
+ {
+- return __builtin_aarch64_fmav2sf (__b, __c, __a);
++ float64x1x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rdf ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 0)};
++ ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 1)};
++ return ret;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vfmaq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
++__extension__ extern __inline uint8x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_u8 (const uint8_t * __a)
+ {
+- return __builtin_aarch64_fmav4sf (__b, __c, __a);
++ uint8x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
++ ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vfmaq_f64 (float64x2_t __a, float64x2_t __b, float64x2_t __c)
++__extension__ extern __inline uint16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_u16 (const uint16_t * __a)
+ {
+- return __builtin_aarch64_fmav2df (__b, __c, __a);
++ uint16x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
++ ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vfma_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
++__extension__ extern __inline uint32x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_u32 (const uint32_t * __a)
+ {
+- return __builtin_aarch64_fmav2sf (__b, vdup_n_f32 (__c), __a);
++ uint32x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0);
++ ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vfmaq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
++__extension__ extern __inline poly8x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_p8 (const poly8_t * __a)
+ {
+- return __builtin_aarch64_fmav4sf (__b, vdupq_n_f32 (__c), __a);
++ poly8x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
++ ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vfmaq_n_f64 (float64x2_t __a, float64x2_t __b, float64_t __c)
++__extension__ extern __inline poly16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_p16 (const poly16_t * __a)
+ {
+- return __builtin_aarch64_fmav2df (__b, vdupq_n_f64 (__c), __a);
++ poly16x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
++ ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
++ return ret;
+ }
+
+-/* vfma_lane */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vfma_lane_f32 (float32x2_t __a, float32x2_t __b,
+- float32x2_t __c, const int __lane)
++__extension__ extern __inline int64x1x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_s64 (const int64_t * __a)
+ {
+- return __builtin_aarch64_fmav2sf (__b,
+- __aarch64_vdup_lane_f32 (__c, __lane),
+- __a);
++ int64x1x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rdi ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 0);
++ ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vfma_lane_f64 (float64x1_t __a, float64x1_t __b,
+- float64x1_t __c, const int __lane)
++__extension__ extern __inline uint64x1x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2_dup_u64 (const uint64_t * __a)
+ {
+- return (float64x1_t) {__builtin_fma (__b[0], __c[0], __a[0])};
++ uint64x1x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rdi ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 0);
++ ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vfmad_lane_f64 (float64_t __a, float64_t __b,
+- float64x1_t __c, const int __lane)
++__extension__ extern __inline int8x16x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_s8 (const int8_t * __a)
+ {
+- return __builtin_fma (__b, __c[0], __a);
++ int8x16x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
++ ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vfmas_lane_f32 (float32_t __a, float32_t __b,
+- float32x2_t __c, const int __lane)
++__extension__ extern __inline poly8x16x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_p8 (const poly8_t * __a)
+ {
+- return __builtin_fmaf (__b, __aarch64_vget_lane_any (__c, __lane), __a);
++ poly8x16x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
++ ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
++ return ret;
+ }
+
+-/* vfma_laneq */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vfma_laneq_f32 (float32x2_t __a, float32x2_t __b,
+- float32x4_t __c, const int __lane)
++__extension__ extern __inline int16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_s16 (const int16_t * __a)
+ {
+- return __builtin_aarch64_fmav2sf (__b,
+- __aarch64_vdup_laneq_f32 (__c, __lane),
+- __a);
++ int16x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
++ ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vfma_laneq_f64 (float64x1_t __a, float64x1_t __b,
+- float64x2_t __c, const int __lane)
++__extension__ extern __inline poly16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_p16 (const poly16_t * __a)
+ {
+- float64_t __c0 = __aarch64_vget_lane_any (__c, __lane);
+- return (float64x1_t) {__builtin_fma (__b[0], __c0, __a[0])};
++ poly16x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
++ ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vfmad_laneq_f64 (float64_t __a, float64_t __b,
+- float64x2_t __c, const int __lane)
++__extension__ extern __inline int32x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_s32 (const int32_t * __a)
+ {
+- return __builtin_fma (__b, __aarch64_vget_lane_any (__c, __lane), __a);
++ int32x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0);
++ ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vfmas_laneq_f32 (float32_t __a, float32_t __b,
+- float32x4_t __c, const int __lane)
++__extension__ extern __inline int64x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_s64 (const int64_t * __a)
+ {
+- return __builtin_fmaf (__b, __aarch64_vget_lane_any (__c, __lane), __a);
++ int64x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0);
++ ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1);
++ return ret;
+ }
+
+-/* vfmaq_lane */
+-
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vfmaq_lane_f32 (float32x4_t __a, float32x4_t __b,
+- float32x2_t __c, const int __lane)
++__extension__ extern __inline uint8x16x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_u8 (const uint8_t * __a)
+ {
+- return __builtin_aarch64_fmav4sf (__b,
+- __aarch64_vdupq_lane_f32 (__c, __lane),
+- __a);
++ uint8x16x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
++ ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vfmaq_lane_f64 (float64x2_t __a, float64x2_t __b,
+- float64x1_t __c, const int __lane)
++__extension__ extern __inline uint16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_u16 (const uint16_t * __a)
+ {
+- return __builtin_aarch64_fmav2df (__b, vdupq_n_f64 (__c[0]), __a);
++ uint16x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
++ ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
++ return ret;
+ }
+
+-/* vfmaq_laneq */
+-
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vfmaq_laneq_f32 (float32x4_t __a, float32x4_t __b,
+- float32x4_t __c, const int __lane)
++__extension__ extern __inline uint32x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_u32 (const uint32_t * __a)
+ {
+- return __builtin_aarch64_fmav4sf (__b,
+- __aarch64_vdupq_laneq_f32 (__c, __lane),
+- __a);
++ uint32x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0);
++ ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vfmaq_laneq_f64 (float64x2_t __a, float64x2_t __b,
+- float64x2_t __c, const int __lane)
++__extension__ extern __inline uint64x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_u64 (const uint64_t * __a)
+ {
+- return __builtin_aarch64_fmav2df (__b,
+- __aarch64_vdupq_laneq_f64 (__c, __lane),
+- __a);
++ uint64x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0);
++ ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1);
++ return ret;
+ }
+
+-/* vfms */
+-
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vfms_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c)
++__extension__ extern __inline float16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_f16 (const float16_t * __a)
+ {
+- return (float64x1_t) {__builtin_fma (-__b[0], __c[0], __a[0])};
++ float16x8x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv8hf ((const __builtin_aarch64_simd_hf *) __a);
++ ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregoiv8hf (__o, 0);
++ ret.val[1] = __builtin_aarch64_get_qregoiv8hf (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vfms_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c)
++__extension__ extern __inline float32x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_f32 (const float32_t * __a)
+ {
+- return __builtin_aarch64_fmav2sf (-__b, __c, __a);
++ float32x4x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv4sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 0);
++ ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vfmsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c)
++__extension__ extern __inline float64x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld2q_dup_f64 (const float64_t * __a)
+ {
+- return __builtin_aarch64_fmav4sf (-__b, __c, __a);
++ float64x2x2_t ret;
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_ld2rv2df ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 0);
++ ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 1);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vfmsq_f64 (float64x2_t __a, float64x2_t __b, float64x2_t __c)
++__extension__ extern __inline int64x1x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_s64 (const int64_t * __a)
+ {
+- return __builtin_aarch64_fmav2df (-__b, __c, __a);
++ int64x1x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rdi ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 0);
++ ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 1);
++ ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 2);
++ return ret;
+ }
+
+-
+-/* vfms_lane */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vfms_lane_f32 (float32x2_t __a, float32x2_t __b,
+- float32x2_t __c, const int __lane)
++__extension__ extern __inline uint64x1x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_u64 (const uint64_t * __a)
+ {
+- return __builtin_aarch64_fmav2sf (-__b,
+- __aarch64_vdup_lane_f32 (__c, __lane),
+- __a);
++ uint64x1x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rdi ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 0);
++ ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 1);
++ ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vfms_lane_f64 (float64x1_t __a, float64x1_t __b,
+- float64x1_t __c, const int __lane)
++__extension__ extern __inline float64x1x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_f64 (const float64_t * __a)
+ {
+- return (float64x1_t) {__builtin_fma (-__b[0], __c[0], __a[0])};
++ float64x1x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rdf ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 0)};
++ ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 1)};
++ ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 2)};
++ return ret;
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vfmsd_lane_f64 (float64_t __a, float64_t __b,
+- float64x1_t __c, const int __lane)
++__extension__ extern __inline int8x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_s8 (const int8_t * __a)
+ {
+- return __builtin_fma (-__b, __c[0], __a);
++ int8x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
++ ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
++ ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vfmss_lane_f32 (float32_t __a, float32_t __b,
+- float32x2_t __c, const int __lane)
++__extension__ extern __inline poly8x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_p8 (const poly8_t * __a)
+ {
+- return __builtin_fmaf (-__b, __aarch64_vget_lane_any (__c, __lane), __a);
++ poly8x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
++ ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
++ ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
++ return ret;
+ }
+
+-/* vfms_laneq */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vfms_laneq_f32 (float32x2_t __a, float32x2_t __b,
+- float32x4_t __c, const int __lane)
++__extension__ extern __inline int16x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_s16 (const int16_t * __a)
+ {
+- return __builtin_aarch64_fmav2sf (-__b,
+- __aarch64_vdup_laneq_f32 (__c, __lane),
+- __a);
++ int16x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
++ ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
++ ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vfms_laneq_f64 (float64x1_t __a, float64x1_t __b,
+- float64x2_t __c, const int __lane)
++__extension__ extern __inline poly16x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_p16 (const poly16_t * __a)
+ {
+- float64_t __c0 = __aarch64_vget_lane_any (__c, __lane);
+- return (float64x1_t) {__builtin_fma (-__b[0], __c0, __a[0])};
++ poly16x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
++ ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
++ ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vfmsd_laneq_f64 (float64_t __a, float64_t __b,
+- float64x2_t __c, const int __lane)
++__extension__ extern __inline int32x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_s32 (const int32_t * __a)
+ {
+- return __builtin_fma (-__b, __aarch64_vget_lane_any (__c, __lane), __a);
++ int32x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0);
++ ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1);
++ ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vfmss_laneq_f32 (float32_t __a, float32_t __b,
+- float32x4_t __c, const int __lane)
++__extension__ extern __inline uint8x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_u8 (const uint8_t * __a)
+ {
+- return __builtin_fmaf (-__b, __aarch64_vget_lane_any (__c, __lane), __a);
++ uint8x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
++ ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
++ ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
++ return ret;
+ }
+
+-/* vfmsq_lane */
+-
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vfmsq_lane_f32 (float32x4_t __a, float32x4_t __b,
+- float32x2_t __c, const int __lane)
++__extension__ extern __inline uint16x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_u16 (const uint16_t * __a)
+ {
+- return __builtin_aarch64_fmav4sf (-__b,
+- __aarch64_vdupq_lane_f32 (__c, __lane),
+- __a);
++ uint16x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
++ ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
++ ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vfmsq_lane_f64 (float64x2_t __a, float64x2_t __b,
+- float64x1_t __c, const int __lane)
++__extension__ extern __inline uint32x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_u32 (const uint32_t * __a)
+ {
+- return __builtin_aarch64_fmav2df (-__b, vdupq_n_f64 (__c[0]), __a);
++ uint32x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0);
++ ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1);
++ ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2);
++ return ret;
+ }
+
+-/* vfmsq_laneq */
+-
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vfmsq_laneq_f32 (float32x4_t __a, float32x4_t __b,
+- float32x4_t __c, const int __lane)
++__extension__ extern __inline float16x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_f16 (const float16_t * __a)
+ {
+- return __builtin_aarch64_fmav4sf (-__b,
+- __aarch64_vdupq_laneq_f32 (__c, __lane),
+- __a);
++ float16x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv4hf ((const __builtin_aarch64_simd_hf *) __a);
++ ret.val[0] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 0);
++ ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 1);
++ ret.val[2] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vfmsq_laneq_f64 (float64x2_t __a, float64x2_t __b,
+- float64x2_t __c, const int __lane)
++__extension__ extern __inline float32x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3_dup_f32 (const float32_t * __a)
+ {
+- return __builtin_aarch64_fmav2df (-__b,
+- __aarch64_vdupq_laneq_f64 (__c, __lane),
+- __a);
++ float32x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv2sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 0);
++ ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 1);
++ ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 2);
++ return ret;
+ }
+
+-/* vld1 */
+-
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+-vld1_f16 (const float16_t *__a)
++__extension__ extern __inline int8x16x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_s8 (const int8_t * __a)
+ {
+- return __builtin_aarch64_ld1v4hf (__a);
++ int8x16x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
++ ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
++ ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vld1_f32 (const float32_t *a)
++__extension__ extern __inline poly8x16x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_p8 (const poly8_t * __a)
+ {
+- return __builtin_aarch64_ld1v2sf ((const __builtin_aarch64_simd_sf *) a);
++ poly8x16x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
++ ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
++ ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vld1_f64 (const float64_t *a)
++__extension__ extern __inline int16x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_s16 (const int16_t * __a)
+ {
+- return (float64x1_t) {*a};
++ int16x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
++ ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
++ ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vld1_p8 (const poly8_t *a)
++__extension__ extern __inline poly16x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_p16 (const poly16_t * __a)
+ {
+- return (poly8x8_t)
+- __builtin_aarch64_ld1v8qi ((const __builtin_aarch64_simd_qi *) a);
++ poly16x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
++ ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
++ ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vld1_p16 (const poly16_t *a)
++__extension__ extern __inline int32x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_s32 (const int32_t * __a)
+ {
+- return (poly16x4_t)
+- __builtin_aarch64_ld1v4hi ((const __builtin_aarch64_simd_hi *) a);
++ int32x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0);
++ ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1);
++ ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vld1_s8 (const int8_t *a)
++__extension__ extern __inline int64x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_s64 (const int64_t * __a)
+ {
+- return __builtin_aarch64_ld1v8qi ((const __builtin_aarch64_simd_qi *) a);
++ int64x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0);
++ ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1);
++ ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vld1_s16 (const int16_t *a)
++__extension__ extern __inline uint8x16x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_u8 (const uint8_t * __a)
+ {
+- return __builtin_aarch64_ld1v4hi ((const __builtin_aarch64_simd_hi *) a);
++ uint8x16x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
++ ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
++ ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vld1_s32 (const int32_t *a)
++__extension__ extern __inline uint16x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_u16 (const uint16_t * __a)
+ {
+- return __builtin_aarch64_ld1v2si ((const __builtin_aarch64_simd_si *) a);
++ uint16x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
++ ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
++ ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vld1_s64 (const int64_t *a)
++__extension__ extern __inline uint32x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_u32 (const uint32_t * __a)
+ {
+- return (int64x1_t) {*a};
++ uint32x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0);
++ ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1);
++ ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vld1_u8 (const uint8_t *a)
++__extension__ extern __inline uint64x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_u64 (const uint64_t * __a)
+ {
+- return (uint8x8_t)
+- __builtin_aarch64_ld1v8qi ((const __builtin_aarch64_simd_qi *) a);
++ uint64x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0);
++ ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1);
++ ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vld1_u16 (const uint16_t *a)
++__extension__ extern __inline float16x8x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_f16 (const float16_t * __a)
+ {
+- return (uint16x4_t)
+- __builtin_aarch64_ld1v4hi ((const __builtin_aarch64_simd_hi *) a);
++ float16x8x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv8hf ((const __builtin_aarch64_simd_hf *) __a);
++ ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 0);
++ ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 1);
++ ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vld1_u32 (const uint32_t *a)
++__extension__ extern __inline float32x4x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_f32 (const float32_t * __a)
+ {
+- return (uint32x2_t)
+- __builtin_aarch64_ld1v2si ((const __builtin_aarch64_simd_si *) a);
++ float32x4x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv4sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 0);
++ ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 1);
++ ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 2);
++ return ret;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vld1_u64 (const uint64_t *a)
++__extension__ extern __inline float64x2x3_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld3q_dup_f64 (const float64_t * __a)
+ {
+- return (uint64x1_t) {*a};
++ float64x2x3_t ret;
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_ld3rv2df ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 0);
++ ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 1);
++ ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 2);
++ return ret;
+ }
+
+-/* vld1q */
+-
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+-vld1q_f16 (const float16_t *__a)
++__extension__ extern __inline int64x1x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_s64 (const int64_t * __a)
+ {
+- return __builtin_aarch64_ld1v8hf (__a);
++ int64x1x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rdi ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 0);
++ ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 1);
++ ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 2);
++ ret.val[3] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vld1q_f32 (const float32_t *a)
++__extension__ extern __inline uint64x1x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_u64 (const uint64_t * __a)
+ {
+- return __builtin_aarch64_ld1v4sf ((const __builtin_aarch64_simd_sf *) a);
++ uint64x1x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rdi ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 0);
++ ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 1);
++ ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 2);
++ ret.val[3] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vld1q_f64 (const float64_t *a)
++__extension__ extern __inline float64x1x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_f64 (const float64_t * __a)
+ {
+- return __builtin_aarch64_ld1v2df ((const __builtin_aarch64_simd_df *) a);
++ float64x1x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rdf ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 0)};
++ ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 1)};
++ ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 2)};
++ ret.val[3] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 3)};
++ return ret;
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vld1q_p8 (const poly8_t *a)
++__extension__ extern __inline int8x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_s8 (const int8_t * __a)
+ {
+- return (poly8x16_t)
+- __builtin_aarch64_ld1v16qi ((const __builtin_aarch64_simd_qi *) a);
++ int8x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
++ ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
++ ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
++ ret.val[3] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vld1q_p16 (const poly16_t *a)
++__extension__ extern __inline poly8x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_p8 (const poly8_t * __a)
+ {
+- return (poly16x8_t)
+- __builtin_aarch64_ld1v8hi ((const __builtin_aarch64_simd_hi *) a);
++ poly8x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
++ ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
++ ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
++ ret.val[3] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vld1q_s8 (const int8_t *a)
++__extension__ extern __inline int16x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_s16 (const int16_t * __a)
+ {
+- return __builtin_aarch64_ld1v16qi ((const __builtin_aarch64_simd_qi *) a);
++ int16x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
++ ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
++ ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
++ ret.val[3] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vld1q_s16 (const int16_t *a)
++__extension__ extern __inline poly16x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_p16 (const poly16_t * __a)
+ {
+- return __builtin_aarch64_ld1v8hi ((const __builtin_aarch64_simd_hi *) a);
++ poly16x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
++ ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
++ ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
++ ret.val[3] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vld1q_s32 (const int32_t *a)
++__extension__ extern __inline int32x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_s32 (const int32_t * __a)
+ {
+- return __builtin_aarch64_ld1v4si ((const __builtin_aarch64_simd_si *) a);
++ int32x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0);
++ ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1);
++ ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2);
++ ret.val[3] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vld1q_s64 (const int64_t *a)
++__extension__ extern __inline uint8x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_u8 (const uint8_t * __a)
+ {
+- return __builtin_aarch64_ld1v2di ((const __builtin_aarch64_simd_di *) a);
++ uint8x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv8qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
++ ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
++ ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
++ ret.val[3] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vld1q_u8 (const uint8_t *a)
++__extension__ extern __inline uint16x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_u16 (const uint16_t * __a)
+ {
+- return (uint8x16_t)
+- __builtin_aarch64_ld1v16qi ((const __builtin_aarch64_simd_qi *) a);
++ uint16x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv4hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
++ ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
++ ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
++ ret.val[3] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vld1q_u16 (const uint16_t *a)
++__extension__ extern __inline uint32x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_u32 (const uint32_t * __a)
+ {
+- return (uint16x8_t)
+- __builtin_aarch64_ld1v8hi ((const __builtin_aarch64_simd_hi *) a);
++ uint32x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv2si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0);
++ ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1);
++ ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2);
++ ret.val[3] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vld1q_u32 (const uint32_t *a)
++__extension__ extern __inline float16x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_f16 (const float16_t * __a)
+ {
+- return (uint32x4_t)
+- __builtin_aarch64_ld1v4si ((const __builtin_aarch64_simd_si *) a);
++ float16x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv4hf ((const __builtin_aarch64_simd_hf *) __a);
++ ret.val[0] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 0);
++ ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 1);
++ ret.val[2] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 2);
++ ret.val[3] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vld1q_u64 (const uint64_t *a)
++__extension__ extern __inline float32x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4_dup_f32 (const float32_t * __a)
+ {
+- return (uint64x2_t)
+- __builtin_aarch64_ld1v2di ((const __builtin_aarch64_simd_di *) a);
++ float32x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv2sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 0);
++ ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 1);
++ ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 2);
++ ret.val[3] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 3);
++ return ret;
+ }
+
+-/* vld1_dup */
+-
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+-vld1_dup_f16 (const float16_t* __a)
++__extension__ extern __inline int8x16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_s8 (const int8_t * __a)
+ {
+- float16_t __f = *__a;
+- return (float16x4_t) { __f, __f, __f, __f };
++ int8x16x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
++ ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
++ ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
++ ret.val[3] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vld1_dup_f32 (const float32_t* __a)
++__extension__ extern __inline poly8x16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_p8 (const poly8_t * __a)
+ {
+- return vdup_n_f32 (*__a);
++ poly8x16x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
++ ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
++ ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
++ ret.val[3] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vld1_dup_f64 (const float64_t* __a)
++__extension__ extern __inline int16x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_s16 (const int16_t * __a)
+ {
+- return vdup_n_f64 (*__a);
++ int16x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
++ ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
++ ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
++ ret.val[3] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vld1_dup_p8 (const poly8_t* __a)
++__extension__ extern __inline poly16x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_p16 (const poly16_t * __a)
+ {
+- return vdup_n_p8 (*__a);
++ poly16x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
++ ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
++ ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
++ ret.val[3] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vld1_dup_p16 (const poly16_t* __a)
++__extension__ extern __inline int32x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_s32 (const int32_t * __a)
+ {
+- return vdup_n_p16 (*__a);
++ int32x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0);
++ ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1);
++ ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2);
++ ret.val[3] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vld1_dup_s8 (const int8_t* __a)
++__extension__ extern __inline int64x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_s64 (const int64_t * __a)
+ {
+- return vdup_n_s8 (*__a);
++ int64x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0);
++ ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1);
++ ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2);
++ ret.val[3] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vld1_dup_s16 (const int16_t* __a)
++__extension__ extern __inline uint8x16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_u8 (const uint8_t * __a)
+ {
+- return vdup_n_s16 (*__a);
++ uint8x16x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv16qi ((const __builtin_aarch64_simd_qi *) __a);
++ ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
++ ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
++ ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
++ ret.val[3] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vld1_dup_s32 (const int32_t* __a)
++__extension__ extern __inline uint16x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_u16 (const uint16_t * __a)
+ {
+- return vdup_n_s32 (*__a);
++ uint16x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv8hi ((const __builtin_aarch64_simd_hi *) __a);
++ ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
++ ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
++ ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
++ ret.val[3] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vld1_dup_s64 (const int64_t* __a)
++__extension__ extern __inline uint32x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_u32 (const uint32_t * __a)
+ {
+- return vdup_n_s64 (*__a);
++ uint32x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv4si ((const __builtin_aarch64_simd_si *) __a);
++ ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0);
++ ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1);
++ ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2);
++ ret.val[3] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vld1_dup_u8 (const uint8_t* __a)
++__extension__ extern __inline uint64x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_u64 (const uint64_t * __a)
+ {
+- return vdup_n_u8 (*__a);
++ uint64x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv2di ((const __builtin_aarch64_simd_di *) __a);
++ ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0);
++ ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1);
++ ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2);
++ ret.val[3] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vld1_dup_u16 (const uint16_t* __a)
++__extension__ extern __inline float16x8x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_f16 (const float16_t * __a)
+ {
+- return vdup_n_u16 (*__a);
++ float16x8x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv8hf ((const __builtin_aarch64_simd_hf *) __a);
++ ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 0);
++ ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 1);
++ ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 2);
++ ret.val[3] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vld1_dup_u32 (const uint32_t* __a)
++__extension__ extern __inline float32x4x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_f32 (const float32_t * __a)
+ {
+- return vdup_n_u32 (*__a);
++ float32x4x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv4sf ((const __builtin_aarch64_simd_sf *) __a);
++ ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 0);
++ ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 1);
++ ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 2);
++ ret.val[3] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 3);
++ return ret;
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vld1_dup_u64 (const uint64_t* __a)
++__extension__ extern __inline float64x2x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vld4q_dup_f64 (const float64_t * __a)
+ {
+- return vdup_n_u64 (*__a);
++ float64x2x4_t ret;
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_ld4rv2df ((const __builtin_aarch64_simd_df *) __a);
++ ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 0);
++ ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 1);
++ ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 2);
++ ret.val[3] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 3);
++ return ret;
+ }
+
+-/* vld1q_dup */
++/* vld2_lane */
+
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+-vld1q_dup_f16 (const float16_t* __a)
+-{
+- float16_t __f = *__a;
+- return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f };
++#define __LD2_LANE_FUNC(intype, vectype, largetype, ptrtype, mode, \
++ qmode, ptrmode, funcsuffix, signedtype) \
++__extension__ extern __inline intype \
++__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \
++vld2_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
++{ \
++ __builtin_aarch64_simd_oi __o; \
++ largetype __temp; \
++ __temp.val[0] = \
++ vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \
++ __temp.val[1] = \
++ vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \
++ __o = __builtin_aarch64_set_qregoi##qmode (__o, \
++ (signedtype) __temp.val[0], \
++ 0); \
++ __o = __builtin_aarch64_set_qregoi##qmode (__o, \
++ (signedtype) __temp.val[1], \
++ 1); \
++ __o = __builtin_aarch64_ld2_lane##mode ( \
++ (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
++ __b.val[0] = (vectype) __builtin_aarch64_get_dregoidi (__o, 0); \
++ __b.val[1] = (vectype) __builtin_aarch64_get_dregoidi (__o, 1); \
++ return __b; \
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vld1q_dup_f32 (const float32_t* __a)
+-{
+- return vdupq_n_f32 (*__a);
+-}
++__LD2_LANE_FUNC (float16x4x2_t, float16x4_t, float16x8x2_t, float16_t, v4hf,
++ v8hf, hf, f16, float16x8_t)
++__LD2_LANE_FUNC (float32x2x2_t, float32x2_t, float32x4x2_t, float32_t, v2sf, v4sf,
++ sf, f32, float32x4_t)
++__LD2_LANE_FUNC (float64x1x2_t, float64x1_t, float64x2x2_t, float64_t, df, v2df,
++ df, f64, float64x2_t)
++__LD2_LANE_FUNC (poly8x8x2_t, poly8x8_t, poly8x16x2_t, poly8_t, v8qi, v16qi, qi, p8,
++ int8x16_t)
++__LD2_LANE_FUNC (poly16x4x2_t, poly16x4_t, poly16x8x2_t, poly16_t, v4hi, v8hi, hi,
++ p16, int16x8_t)
++__LD2_LANE_FUNC (int8x8x2_t, int8x8_t, int8x16x2_t, int8_t, v8qi, v16qi, qi, s8,
++ int8x16_t)
++__LD2_LANE_FUNC (int16x4x2_t, int16x4_t, int16x8x2_t, int16_t, v4hi, v8hi, hi, s16,
++ int16x8_t)
++__LD2_LANE_FUNC (int32x2x2_t, int32x2_t, int32x4x2_t, int32_t, v2si, v4si, si, s32,
++ int32x4_t)
++__LD2_LANE_FUNC (int64x1x2_t, int64x1_t, int64x2x2_t, int64_t, di, v2di, di, s64,
++ int64x2_t)
++__LD2_LANE_FUNC (uint8x8x2_t, uint8x8_t, uint8x16x2_t, uint8_t, v8qi, v16qi, qi, u8,
++ int8x16_t)
++__LD2_LANE_FUNC (uint16x4x2_t, uint16x4_t, uint16x8x2_t, uint16_t, v4hi, v8hi, hi,
++ u16, int16x8_t)
++__LD2_LANE_FUNC (uint32x2x2_t, uint32x2_t, uint32x4x2_t, uint32_t, v2si, v4si, si,
++ u32, int32x4_t)
++__LD2_LANE_FUNC (uint64x1x2_t, uint64x1_t, uint64x2x2_t, uint64_t, di, v2di, di,
++ u64, int64x2_t)
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vld1q_dup_f64 (const float64_t* __a)
+-{
+- return vdupq_n_f64 (*__a);
+-}
++#undef __LD2_LANE_FUNC
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vld1q_dup_p8 (const poly8_t* __a)
+-{
+- return vdupq_n_p8 (*__a);
++/* vld2q_lane */
++
++#define __LD2_LANE_FUNC(intype, vtype, ptrtype, mode, ptrmode, funcsuffix) \
++__extension__ extern __inline intype \
++__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \
++vld2q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
++{ \
++ __builtin_aarch64_simd_oi __o; \
++ intype ret; \
++ __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); \
++ __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); \
++ __o = __builtin_aarch64_ld2_lane##mode ( \
++ (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
++ ret.val[0] = (vtype) __builtin_aarch64_get_qregoiv4si (__o, 0); \
++ ret.val[1] = (vtype) __builtin_aarch64_get_qregoiv4si (__o, 1); \
++ return ret; \
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vld1q_dup_p16 (const poly16_t* __a)
+-{
+- return vdupq_n_p16 (*__a);
+-}
++__LD2_LANE_FUNC (float16x8x2_t, float16x8_t, float16_t, v8hf, hf, f16)
++__LD2_LANE_FUNC (float32x4x2_t, float32x4_t, float32_t, v4sf, sf, f32)
++__LD2_LANE_FUNC (float64x2x2_t, float64x2_t, float64_t, v2df, df, f64)
++__LD2_LANE_FUNC (poly8x16x2_t, poly8x16_t, poly8_t, v16qi, qi, p8)
++__LD2_LANE_FUNC (poly16x8x2_t, poly16x8_t, poly16_t, v8hi, hi, p16)
++__LD2_LANE_FUNC (int8x16x2_t, int8x16_t, int8_t, v16qi, qi, s8)
++__LD2_LANE_FUNC (int16x8x2_t, int16x8_t, int16_t, v8hi, hi, s16)
++__LD2_LANE_FUNC (int32x4x2_t, int32x4_t, int32_t, v4si, si, s32)
++__LD2_LANE_FUNC (int64x2x2_t, int64x2_t, int64_t, v2di, di, s64)
++__LD2_LANE_FUNC (uint8x16x2_t, uint8x16_t, uint8_t, v16qi, qi, u8)
++__LD2_LANE_FUNC (uint16x8x2_t, uint16x8_t, uint16_t, v8hi, hi, u16)
++__LD2_LANE_FUNC (uint32x4x2_t, uint32x4_t, uint32_t, v4si, si, u32)
++__LD2_LANE_FUNC (uint64x2x2_t, uint64x2_t, uint64_t, v2di, di, u64)
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vld1q_dup_s8 (const int8_t* __a)
+-{
+- return vdupq_n_s8 (*__a);
+-}
++#undef __LD2_LANE_FUNC
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vld1q_dup_s16 (const int16_t* __a)
+-{
+- return vdupq_n_s16 (*__a);
+-}
++/* vld3_lane */
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vld1q_dup_s32 (const int32_t* __a)
+-{
+- return vdupq_n_s32 (*__a);
++#define __LD3_LANE_FUNC(intype, vectype, largetype, ptrtype, mode, \
++ qmode, ptrmode, funcsuffix, signedtype) \
++__extension__ extern __inline intype \
++__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \
++vld3_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
++{ \
++ __builtin_aarch64_simd_ci __o; \
++ largetype __temp; \
++ __temp.val[0] = \
++ vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \
++ __temp.val[1] = \
++ vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \
++ __temp.val[2] = \
++ vcombine_##funcsuffix (__b.val[2], vcreate_##funcsuffix (0)); \
++ __o = __builtin_aarch64_set_qregci##qmode (__o, \
++ (signedtype) __temp.val[0], \
++ 0); \
++ __o = __builtin_aarch64_set_qregci##qmode (__o, \
++ (signedtype) __temp.val[1], \
++ 1); \
++ __o = __builtin_aarch64_set_qregci##qmode (__o, \
++ (signedtype) __temp.val[2], \
++ 2); \
++ __o = __builtin_aarch64_ld3_lane##mode ( \
++ (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
++ __b.val[0] = (vectype) __builtin_aarch64_get_dregcidi (__o, 0); \
++ __b.val[1] = (vectype) __builtin_aarch64_get_dregcidi (__o, 1); \
++ __b.val[2] = (vectype) __builtin_aarch64_get_dregcidi (__o, 2); \
++ return __b; \
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vld1q_dup_s64 (const int64_t* __a)
+-{
+- return vdupq_n_s64 (*__a);
+-}
++__LD3_LANE_FUNC (float16x4x3_t, float16x4_t, float16x8x3_t, float16_t, v4hf,
++ v8hf, hf, f16, float16x8_t)
++__LD3_LANE_FUNC (float32x2x3_t, float32x2_t, float32x4x3_t, float32_t, v2sf, v4sf,
++ sf, f32, float32x4_t)
++__LD3_LANE_FUNC (float64x1x3_t, float64x1_t, float64x2x3_t, float64_t, df, v2df,
++ df, f64, float64x2_t)
++__LD3_LANE_FUNC (poly8x8x3_t, poly8x8_t, poly8x16x3_t, poly8_t, v8qi, v16qi, qi, p8,
++ int8x16_t)
++__LD3_LANE_FUNC (poly16x4x3_t, poly16x4_t, poly16x8x3_t, poly16_t, v4hi, v8hi, hi,
++ p16, int16x8_t)
++__LD3_LANE_FUNC (int8x8x3_t, int8x8_t, int8x16x3_t, int8_t, v8qi, v16qi, qi, s8,
++ int8x16_t)
++__LD3_LANE_FUNC (int16x4x3_t, int16x4_t, int16x8x3_t, int16_t, v4hi, v8hi, hi, s16,
++ int16x8_t)
++__LD3_LANE_FUNC (int32x2x3_t, int32x2_t, int32x4x3_t, int32_t, v2si, v4si, si, s32,
++ int32x4_t)
++__LD3_LANE_FUNC (int64x1x3_t, int64x1_t, int64x2x3_t, int64_t, di, v2di, di, s64,
++ int64x2_t)
++__LD3_LANE_FUNC (uint8x8x3_t, uint8x8_t, uint8x16x3_t, uint8_t, v8qi, v16qi, qi, u8,
++ int8x16_t)
++__LD3_LANE_FUNC (uint16x4x3_t, uint16x4_t, uint16x8x3_t, uint16_t, v4hi, v8hi, hi,
++ u16, int16x8_t)
++__LD3_LANE_FUNC (uint32x2x3_t, uint32x2_t, uint32x4x3_t, uint32_t, v2si, v4si, si,
++ u32, int32x4_t)
++__LD3_LANE_FUNC (uint64x1x3_t, uint64x1_t, uint64x2x3_t, uint64_t, di, v2di, di,
++ u64, int64x2_t)
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vld1q_dup_u8 (const uint8_t* __a)
+-{
+- return vdupq_n_u8 (*__a);
+-}
++#undef __LD3_LANE_FUNC
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vld1q_dup_u16 (const uint16_t* __a)
+-{
+- return vdupq_n_u16 (*__a);
+-}
++/* vld3q_lane */
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vld1q_dup_u32 (const uint32_t* __a)
+-{
+- return vdupq_n_u32 (*__a);
++#define __LD3_LANE_FUNC(intype, vtype, ptrtype, mode, ptrmode, funcsuffix) \
++__extension__ extern __inline intype \
++__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \
++vld3q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
++{ \
++ __builtin_aarch64_simd_ci __o; \
++ intype ret; \
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); \
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); \
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); \
++ __o = __builtin_aarch64_ld3_lane##mode ( \
++ (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
++ ret.val[0] = (vtype) __builtin_aarch64_get_qregciv4si (__o, 0); \
++ ret.val[1] = (vtype) __builtin_aarch64_get_qregciv4si (__o, 1); \
++ ret.val[2] = (vtype) __builtin_aarch64_get_qregciv4si (__o, 2); \
++ return ret; \
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vld1q_dup_u64 (const uint64_t* __a)
+-{
+- return vdupq_n_u64 (*__a);
+-}
++__LD3_LANE_FUNC (float16x8x3_t, float16x8_t, float16_t, v8hf, hf, f16)
++__LD3_LANE_FUNC (float32x4x3_t, float32x4_t, float32_t, v4sf, sf, f32)
++__LD3_LANE_FUNC (float64x2x3_t, float64x2_t, float64_t, v2df, df, f64)
++__LD3_LANE_FUNC (poly8x16x3_t, poly8x16_t, poly8_t, v16qi, qi, p8)
++__LD3_LANE_FUNC (poly16x8x3_t, poly16x8_t, poly16_t, v8hi, hi, p16)
++__LD3_LANE_FUNC (int8x16x3_t, int8x16_t, int8_t, v16qi, qi, s8)
++__LD3_LANE_FUNC (int16x8x3_t, int16x8_t, int16_t, v8hi, hi, s16)
++__LD3_LANE_FUNC (int32x4x3_t, int32x4_t, int32_t, v4si, si, s32)
++__LD3_LANE_FUNC (int64x2x3_t, int64x2_t, int64_t, v2di, di, s64)
++__LD3_LANE_FUNC (uint8x16x3_t, uint8x16_t, uint8_t, v16qi, qi, u8)
++__LD3_LANE_FUNC (uint16x8x3_t, uint16x8_t, uint16_t, v8hi, hi, u16)
++__LD3_LANE_FUNC (uint32x4x3_t, uint32x4_t, uint32_t, v4si, si, u32)
++__LD3_LANE_FUNC (uint64x2x3_t, uint64x2_t, uint64_t, v2di, di, u64)
+
+-/* vld1_lane */
++#undef __LD3_LANE_FUNC
+
+-__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+-vld1_lane_f16 (const float16_t *__src, float16x4_t __vec, const int __lane)
+-{
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
+-}
++/* vld4_lane */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vld1_lane_f32 (const float32_t *__src, float32x2_t __vec, const int __lane)
+-{
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++#define __LD4_LANE_FUNC(intype, vectype, largetype, ptrtype, mode, \
++ qmode, ptrmode, funcsuffix, signedtype) \
++__extension__ extern __inline intype \
++__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \
++vld4_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
++{ \
++ __builtin_aarch64_simd_xi __o; \
++ largetype __temp; \
++ __temp.val[0] = \
++ vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \
++ __temp.val[1] = \
++ vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \
++ __temp.val[2] = \
++ vcombine_##funcsuffix (__b.val[2], vcreate_##funcsuffix (0)); \
++ __temp.val[3] = \
++ vcombine_##funcsuffix (__b.val[3], vcreate_##funcsuffix (0)); \
++ __o = __builtin_aarch64_set_qregxi##qmode (__o, \
++ (signedtype) __temp.val[0], \
++ 0); \
++ __o = __builtin_aarch64_set_qregxi##qmode (__o, \
++ (signedtype) __temp.val[1], \
++ 1); \
++ __o = __builtin_aarch64_set_qregxi##qmode (__o, \
++ (signedtype) __temp.val[2], \
++ 2); \
++ __o = __builtin_aarch64_set_qregxi##qmode (__o, \
++ (signedtype) __temp.val[3], \
++ 3); \
++ __o = __builtin_aarch64_ld4_lane##mode ( \
++ (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
++ __b.val[0] = (vectype) __builtin_aarch64_get_dregxidi (__o, 0); \
++ __b.val[1] = (vectype) __builtin_aarch64_get_dregxidi (__o, 1); \
++ __b.val[2] = (vectype) __builtin_aarch64_get_dregxidi (__o, 2); \
++ __b.val[3] = (vectype) __builtin_aarch64_get_dregxidi (__o, 3); \
++ return __b; \
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vld1_lane_f64 (const float64_t *__src, float64x1_t __vec, const int __lane)
+-{
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
+-}
++/* vld4q_lane */
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vld1_lane_p8 (const poly8_t *__src, poly8x8_t __vec, const int __lane)
+-{
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
+-}
++__LD4_LANE_FUNC (float16x4x4_t, float16x4_t, float16x8x4_t, float16_t, v4hf,
++ v8hf, hf, f16, float16x8_t)
++__LD4_LANE_FUNC (float32x2x4_t, float32x2_t, float32x4x4_t, float32_t, v2sf, v4sf,
++ sf, f32, float32x4_t)
++__LD4_LANE_FUNC (float64x1x4_t, float64x1_t, float64x2x4_t, float64_t, df, v2df,
++ df, f64, float64x2_t)
++__LD4_LANE_FUNC (poly8x8x4_t, poly8x8_t, poly8x16x4_t, poly8_t, v8qi, v16qi, qi, p8,
++ int8x16_t)
++__LD4_LANE_FUNC (poly16x4x4_t, poly16x4_t, poly16x8x4_t, poly16_t, v4hi, v8hi, hi,
++ p16, int16x8_t)
++__LD4_LANE_FUNC (int8x8x4_t, int8x8_t, int8x16x4_t, int8_t, v8qi, v16qi, qi, s8,
++ int8x16_t)
++__LD4_LANE_FUNC (int16x4x4_t, int16x4_t, int16x8x4_t, int16_t, v4hi, v8hi, hi, s16,
++ int16x8_t)
++__LD4_LANE_FUNC (int32x2x4_t, int32x2_t, int32x4x4_t, int32_t, v2si, v4si, si, s32,
++ int32x4_t)
++__LD4_LANE_FUNC (int64x1x4_t, int64x1_t, int64x2x4_t, int64_t, di, v2di, di, s64,
++ int64x2_t)
++__LD4_LANE_FUNC (uint8x8x4_t, uint8x8_t, uint8x16x4_t, uint8_t, v8qi, v16qi, qi, u8,
++ int8x16_t)
++__LD4_LANE_FUNC (uint16x4x4_t, uint16x4_t, uint16x8x4_t, uint16_t, v4hi, v8hi, hi,
++ u16, int16x8_t)
++__LD4_LANE_FUNC (uint32x2x4_t, uint32x2_t, uint32x4x4_t, uint32_t, v2si, v4si, si,
++ u32, int32x4_t)
++__LD4_LANE_FUNC (uint64x1x4_t, uint64x1_t, uint64x2x4_t, uint64_t, di, v2di, di,
++ u64, int64x2_t)
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vld1_lane_p16 (const poly16_t *__src, poly16x4_t __vec, const int __lane)
+-{
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
+-}
++#undef __LD4_LANE_FUNC
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vld1_lane_s8 (const int8_t *__src, int8x8_t __vec, const int __lane)
+-{
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
+-}
++/* vld4q_lane */
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vld1_lane_s16 (const int16_t *__src, int16x4_t __vec, const int __lane)
+-{
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++#define __LD4_LANE_FUNC(intype, vtype, ptrtype, mode, ptrmode, funcsuffix) \
++__extension__ extern __inline intype \
++__attribute__ ((__always_inline__, __gnu_inline__,__artificial__)) \
++vld4q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
++{ \
++ __builtin_aarch64_simd_xi __o; \
++ intype ret; \
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); \
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); \
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); \
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); \
++ __o = __builtin_aarch64_ld4_lane##mode ( \
++ (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
++ ret.val[0] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 0); \
++ ret.val[1] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 1); \
++ ret.val[2] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 2); \
++ ret.val[3] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 3); \
++ return ret; \
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vld1_lane_s32 (const int32_t *__src, int32x2_t __vec, const int __lane)
+-{
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
+-}
++__LD4_LANE_FUNC (float16x8x4_t, float16x8_t, float16_t, v8hf, hf, f16)
++__LD4_LANE_FUNC (float32x4x4_t, float32x4_t, float32_t, v4sf, sf, f32)
++__LD4_LANE_FUNC (float64x2x4_t, float64x2_t, float64_t, v2df, df, f64)
++__LD4_LANE_FUNC (poly8x16x4_t, poly8x16_t, poly8_t, v16qi, qi, p8)
++__LD4_LANE_FUNC (poly16x8x4_t, poly16x8_t, poly16_t, v8hi, hi, p16)
++__LD4_LANE_FUNC (int8x16x4_t, int8x16_t, int8_t, v16qi, qi, s8)
++__LD4_LANE_FUNC (int16x8x4_t, int16x8_t, int16_t, v8hi, hi, s16)
++__LD4_LANE_FUNC (int32x4x4_t, int32x4_t, int32_t, v4si, si, s32)
++__LD4_LANE_FUNC (int64x2x4_t, int64x2_t, int64_t, v2di, di, s64)
++__LD4_LANE_FUNC (uint8x16x4_t, uint8x16_t, uint8_t, v16qi, qi, u8)
++__LD4_LANE_FUNC (uint16x8x4_t, uint16x8_t, uint16_t, v8hi, hi, u16)
++__LD4_LANE_FUNC (uint32x4x4_t, uint32x4_t, uint32_t, v4si, si, u32)
++__LD4_LANE_FUNC (uint64x2x4_t, uint64x2_t, uint64_t, v2di, di, u64)
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vld1_lane_s64 (const int64_t *__src, int64x1_t __vec, const int __lane)
+-{
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
+-}
++#undef __LD4_LANE_FUNC
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vld1_lane_u8 (const uint8_t *__src, uint8x8_t __vec, const int __lane)
+-{
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
+-}
++/* vmax */
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vld1_lane_u16 (const uint16_t *__src, uint16x4_t __vec, const int __lane)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmax_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return __builtin_aarch64_smax_nanv2sf (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vld1_lane_u32 (const uint32_t *__src, uint32x2_t __vec, const int __lane)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmax_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return (float64x1_t)
++ { __builtin_aarch64_smax_nandf (vget_lane_f64 (__a, 0),
++ vget_lane_f64 (__b, 0)) };
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vld1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmax_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return __builtin_aarch64_smaxv8qi (__a, __b);
+ }
+
+-/* vld1q_lane */
+-
+-__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+-vld1q_lane_f16 (const float16_t *__src, float16x8_t __vec, const int __lane)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmax_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return __builtin_aarch64_smaxv4hi (__a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vld1q_lane_f32 (const float32_t *__src, float32x4_t __vec, const int __lane)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmax_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return __builtin_aarch64_smaxv2si (__a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vld1q_lane_f64 (const float64_t *__src, float64x2_t __vec, const int __lane)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmax_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return (uint8x8_t) __builtin_aarch64_umaxv8qi ((int8x8_t) __a,
++ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vld1q_lane_p8 (const poly8_t *__src, poly8x16_t __vec, const int __lane)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmax_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return (uint16x4_t) __builtin_aarch64_umaxv4hi ((int16x4_t) __a,
++ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vld1q_lane_p16 (const poly16_t *__src, poly16x8_t __vec, const int __lane)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmax_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return (uint32x2_t) __builtin_aarch64_umaxv2si ((int32x2_t) __a,
++ (int32x2_t) __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vld1q_lane_s8 (const int8_t *__src, int8x16_t __vec, const int __lane)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return __builtin_aarch64_smax_nanv4sf (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vld1q_lane_s16 (const int16_t *__src, int16x8_t __vec, const int __lane)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return __builtin_aarch64_smax_nanv2df (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vld1q_lane_s32 (const int32_t *__src, int32x4_t __vec, const int __lane)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return __builtin_aarch64_smaxv16qi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vld1q_lane_s64 (const int64_t *__src, int64x2_t __vec, const int __lane)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return __builtin_aarch64_smaxv8hi (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vld1q_lane_u8 (const uint8_t *__src, uint8x16_t __vec, const int __lane)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return __builtin_aarch64_smaxv4si (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vld1q_lane_u16 (const uint16_t *__src, uint16x8_t __vec, const int __lane)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return (uint8x16_t) __builtin_aarch64_umaxv16qi ((int8x16_t) __a,
++ (int8x16_t) __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vld1q_lane_u32 (const uint32_t *__src, uint32x4_t __vec, const int __lane)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return (uint16x8_t) __builtin_aarch64_umaxv8hi ((int16x8_t) __a,
++ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vld1q_lane_u64 (const uint64_t *__src, uint64x2_t __vec, const int __lane)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+- return __aarch64_vset_lane_any (*__src, __vec, __lane);
++ return (uint32x4_t) __builtin_aarch64_umaxv4si ((int32x4_t) __a,
++ (int32x4_t) __b);
+ }
++/* vmulx */
+
+-/* vldn */
+-
+-__extension__ static __inline int64x1x2_t __attribute__ ((__always_inline__))
+-vld2_s64 (const int64_t * __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulx_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- int64x1x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 0);
+- ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 1);
+- return ret;
++ return __builtin_aarch64_fmulxv2sf (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1x2_t __attribute__ ((__always_inline__))
+-vld2_u64 (const uint64_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- uint64x1x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 0);
+- ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 1);
+- return ret;
++ return __builtin_aarch64_fmulxv4sf (__a, __b);
+ }
+
+-__extension__ static __inline float64x1x2_t __attribute__ ((__always_inline__))
+-vld2_f64 (const float64_t * __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulx_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- float64x1x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2df ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 0)};
+- ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 1)};
+- return ret;
++ return (float64x1_t) {__builtin_aarch64_fmulxdf (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline int8x8x2_t __attribute__ ((__always_inline__))
+-vld2_s8 (const int8_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- int8x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
+- ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
+- return ret;
++ return __builtin_aarch64_fmulxv2df (__a, __b);
+ }
+
+-__extension__ static __inline poly8x8x2_t __attribute__ ((__always_inline__))
+-vld2_p8 (const poly8_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxs_f32 (float32_t __a, float32_t __b)
+ {
+- poly8x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
+- ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
+- return ret;
++ return __builtin_aarch64_fmulxsf (__a, __b);
+ }
+
+-__extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__))
+-vld2_s16 (const int16_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxd_f64 (float64_t __a, float64_t __b)
+ {
+- int16x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
+- ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
+- return ret;
++ return __builtin_aarch64_fmulxdf (__a, __b);
+ }
+
+-__extension__ static __inline poly16x4x2_t __attribute__ ((__always_inline__))
+-vld2_p16 (const poly16_t * __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulx_lane_f32 (float32x2_t __a, float32x2_t __v, const int __lane)
+ {
+- poly16x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
+- ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
+- return ret;
++ return vmulx_f32 (__a, __aarch64_vdup_lane_f32 (__v, __lane));
+ }
+
+-__extension__ static __inline int32x2x2_t __attribute__ ((__always_inline__))
+-vld2_s32 (const int32_t * __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulx_lane_f64 (float64x1_t __a, float64x1_t __v, const int __lane)
+ {
+- int32x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0);
+- ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1);
+- return ret;
++ return vmulx_f64 (__a, __aarch64_vdup_lane_f64 (__v, __lane));
+ }
+
+-__extension__ static __inline uint8x8x2_t __attribute__ ((__always_inline__))
+-vld2_u8 (const uint8_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxq_lane_f32 (float32x4_t __a, float32x2_t __v, const int __lane)
+ {
+- uint8x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
+- ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
+- return ret;
++ return vmulxq_f32 (__a, __aarch64_vdupq_lane_f32 (__v, __lane));
+ }
+
+-__extension__ static __inline uint16x4x2_t __attribute__ ((__always_inline__))
+-vld2_u16 (const uint16_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxq_lane_f64 (float64x2_t __a, float64x1_t __v, const int __lane)
+ {
+- uint16x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
+- ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
+- return ret;
++ return vmulxq_f64 (__a, __aarch64_vdupq_lane_f64 (__v, __lane));
+ }
+
+-__extension__ static __inline uint32x2x2_t __attribute__ ((__always_inline__))
+-vld2_u32 (const uint32_t * __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulx_laneq_f32 (float32x2_t __a, float32x4_t __v, const int __lane)
+ {
+- uint32x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0);
+- ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1);
+- return ret;
++ return vmulx_f32 (__a, __aarch64_vdup_laneq_f32 (__v, __lane));
+ }
+
+-__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+-vld2_f16 (const float16_t * __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulx_laneq_f64 (float64x1_t __a, float64x2_t __v, const int __lane)
+ {
+- float16x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v4hf (__a);
+- ret.val[0] = __builtin_aarch64_get_dregoiv4hf (__o, 0);
+- ret.val[1] = __builtin_aarch64_get_dregoiv4hf (__o, 1);
+- return ret;
++ return vmulx_f64 (__a, __aarch64_vdup_laneq_f64 (__v, __lane));
+ }
+
+-__extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
+-vld2_f32 (const float32_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxq_laneq_f32 (float32x4_t __a, float32x4_t __v, const int __lane)
+ {
+- float32x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v2sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 0);
+- ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 1);
+- return ret;
++ return vmulxq_f32 (__a, __aarch64_vdupq_laneq_f32 (__v, __lane));
+ }
+
+-__extension__ static __inline int8x16x2_t __attribute__ ((__always_inline__))
+-vld2q_s8 (const int8_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxq_laneq_f64 (float64x2_t __a, float64x2_t __v, const int __lane)
+ {
+- int8x16x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
+- ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
+- return ret;
++ return vmulxq_f64 (__a, __aarch64_vdupq_laneq_f64 (__v, __lane));
+ }
+
+-__extension__ static __inline poly8x16x2_t __attribute__ ((__always_inline__))
+-vld2q_p8 (const poly8_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxs_lane_f32 (float32_t __a, float32x2_t __v, const int __lane)
+ {
+- poly8x16x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
+- ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
+- return ret;
++ return vmulxs_f32 (__a, __aarch64_vget_lane_any (__v, __lane));
+ }
+
+-__extension__ static __inline int16x8x2_t __attribute__ ((__always_inline__))
+-vld2q_s16 (const int16_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxs_laneq_f32 (float32_t __a, float32x4_t __v, const int __lane)
+ {
+- int16x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
+- ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
+- return ret;
++ return vmulxs_f32 (__a, __aarch64_vget_lane_any (__v, __lane));
+ }
+
+-__extension__ static __inline poly16x8x2_t __attribute__ ((__always_inline__))
+-vld2q_p16 (const poly16_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxd_lane_f64 (float64_t __a, float64x1_t __v, const int __lane)
+ {
+- poly16x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
+- ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
+- return ret;
++ return vmulxd_f64 (__a, __aarch64_vget_lane_any (__v, __lane));
+ }
+
+-__extension__ static __inline int32x4x2_t __attribute__ ((__always_inline__))
+-vld2q_s32 (const int32_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxd_laneq_f64 (float64_t __a, float64x2_t __v, const int __lane)
+ {
+- int32x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0);
+- ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1);
+- return ret;
++ return vmulxd_f64 (__a, __aarch64_vget_lane_any (__v, __lane));
+ }
+
+-__extension__ static __inline int64x2x2_t __attribute__ ((__always_inline__))
+-vld2q_s64 (const int64_t * __a)
+-{
+- int64x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0);
+- ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1);
+- return ret;
++/* vpmax */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmax_s8 (int8x8_t a, int8x8_t b)
++{
++ return __builtin_aarch64_smaxpv8qi (a, b);
+ }
+
+-__extension__ static __inline uint8x16x2_t __attribute__ ((__always_inline__))
+-vld2q_u8 (const uint8_t * __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmax_s16 (int16x4_t a, int16x4_t b)
+ {
+- uint8x16x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
+- ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
+- return ret;
++ return __builtin_aarch64_smaxpv4hi (a, b);
+ }
+
+-__extension__ static __inline uint16x8x2_t __attribute__ ((__always_inline__))
+-vld2q_u16 (const uint16_t * __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmax_s32 (int32x2_t a, int32x2_t b)
+ {
+- uint16x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
+- ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
+- return ret;
++ return __builtin_aarch64_smaxpv2si (a, b);
+ }
+
+-__extension__ static __inline uint32x4x2_t __attribute__ ((__always_inline__))
+-vld2q_u32 (const uint32_t * __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmax_u8 (uint8x8_t a, uint8x8_t b)
+ {
+- uint32x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0);
+- ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1);
+- return ret;
++ return (uint8x8_t) __builtin_aarch64_umaxpv8qi ((int8x8_t) a,
++ (int8x8_t) b);
+ }
+
+-__extension__ static __inline uint64x2x2_t __attribute__ ((__always_inline__))
+-vld2q_u64 (const uint64_t * __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmax_u16 (uint16x4_t a, uint16x4_t b)
+ {
+- uint64x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0);
+- ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1);
+- return ret;
++ return (uint16x4_t) __builtin_aarch64_umaxpv4hi ((int16x4_t) a,
++ (int16x4_t) b);
+ }
+
+-__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+-vld2q_f16 (const float16_t * __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmax_u32 (uint32x2_t a, uint32x2_t b)
+ {
+- float16x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v8hf (__a);
+- ret.val[0] = __builtin_aarch64_get_qregoiv8hf (__o, 0);
+- ret.val[1] = __builtin_aarch64_get_qregoiv8hf (__o, 1);
+- return ret;
++ return (uint32x2_t) __builtin_aarch64_umaxpv2si ((int32x2_t) a,
++ (int32x2_t) b);
+ }
+
+-__extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
+-vld2q_f32 (const float32_t * __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxq_s8 (int8x16_t a, int8x16_t b)
+ {
+- float32x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v4sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 0);
+- ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 1);
+- return ret;
++ return __builtin_aarch64_smaxpv16qi (a, b);
+ }
+
+-__extension__ static __inline float64x2x2_t __attribute__ ((__always_inline__))
+-vld2q_f64 (const float64_t * __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxq_s16 (int16x8_t a, int16x8_t b)
+ {
+- float64x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2v2df ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 0);
+- ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 1);
+- return ret;
++ return __builtin_aarch64_smaxpv8hi (a, b);
+ }
+
+-__extension__ static __inline int64x1x3_t __attribute__ ((__always_inline__))
+-vld3_s64 (const int64_t * __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxq_s32 (int32x4_t a, int32x4_t b)
+ {
+- int64x1x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 0);
+- ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 1);
+- ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 2);
+- return ret;
++ return __builtin_aarch64_smaxpv4si (a, b);
+ }
+
+-__extension__ static __inline uint64x1x3_t __attribute__ ((__always_inline__))
+-vld3_u64 (const uint64_t * __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxq_u8 (uint8x16_t a, uint8x16_t b)
+ {
+- uint64x1x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 0);
+- ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 1);
+- ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 2);
+- return ret;
++ return (uint8x16_t) __builtin_aarch64_umaxpv16qi ((int8x16_t) a,
++ (int8x16_t) b);
+ }
+
+-__extension__ static __inline float64x1x3_t __attribute__ ((__always_inline__))
+-vld3_f64 (const float64_t * __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxq_u16 (uint16x8_t a, uint16x8_t b)
+ {
+- float64x1x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3df ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 0)};
+- ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 1)};
+- ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 2)};
+- return ret;
++ return (uint16x8_t) __builtin_aarch64_umaxpv8hi ((int16x8_t) a,
++ (int16x8_t) b);
+ }
+
+-__extension__ static __inline int8x8x3_t __attribute__ ((__always_inline__))
+-vld3_s8 (const int8_t * __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxq_u32 (uint32x4_t a, uint32x4_t b)
+ {
+- int8x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
+- ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
+- ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
+- return ret;
++ return (uint32x4_t) __builtin_aarch64_umaxpv4si ((int32x4_t) a,
++ (int32x4_t) b);
+ }
+
+-__extension__ static __inline poly8x8x3_t __attribute__ ((__always_inline__))
+-vld3_p8 (const poly8_t * __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmax_f32 (float32x2_t a, float32x2_t b)
+ {
+- poly8x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
+- ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
+- ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
+- return ret;
++ return __builtin_aarch64_smax_nanpv2sf (a, b);
+ }
+
+-__extension__ static __inline int16x4x3_t __attribute__ ((__always_inline__))
+-vld3_s16 (const int16_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxq_f32 (float32x4_t a, float32x4_t b)
+ {
+- int16x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
+- ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
+- ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_smax_nanpv4sf (a, b);
+ }
+
+-__extension__ static __inline poly16x4x3_t __attribute__ ((__always_inline__))
+-vld3_p16 (const poly16_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxq_f64 (float64x2_t a, float64x2_t b)
+ {
+- poly16x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
+- ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
+- ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_smax_nanpv2df (a, b);
+ }
+
+-__extension__ static __inline int32x2x3_t __attribute__ ((__always_inline__))
+-vld3_s32 (const int32_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxqd_f64 (float64x2_t a)
+ {
+- int32x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0);
+- ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1);
+- ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smax_nan_scal_v2df (a);
+ }
+
+-__extension__ static __inline uint8x8x3_t __attribute__ ((__always_inline__))
+-vld3_u8 (const uint8_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxs_f32 (float32x2_t a)
+ {
+- uint8x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
+- ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
+- ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smax_nan_scal_v2sf (a);
+ }
+
+-__extension__ static __inline uint16x4x3_t __attribute__ ((__always_inline__))
+-vld3_u16 (const uint16_t * __a)
++/* vpmaxnm */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxnm_f32 (float32x2_t a, float32x2_t b)
+ {
+- uint16x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
+- ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
+- ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_smaxpv2sf (a, b);
+ }
+
+-__extension__ static __inline uint32x2x3_t __attribute__ ((__always_inline__))
+-vld3_u32 (const uint32_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxnmq_f32 (float32x4_t a, float32x4_t b)
+ {
+- uint32x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0);
+- ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1);
+- ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2);
+- return ret;
++ return __builtin_aarch64_smaxpv4sf (a, b);
+ }
+
+-__extension__ static __inline float16x4x3_t __attribute__ ((__always_inline__))
+-vld3_f16 (const float16_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxnmq_f64 (float64x2_t a, float64x2_t b)
+ {
+- float16x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v4hf (__a);
+- ret.val[0] = __builtin_aarch64_get_dregciv4hf (__o, 0);
+- ret.val[1] = __builtin_aarch64_get_dregciv4hf (__o, 1);
+- ret.val[2] = __builtin_aarch64_get_dregciv4hf (__o, 2);
+- return ret;
++ return __builtin_aarch64_smaxpv2df (a, b);
+ }
+
+-__extension__ static __inline float32x2x3_t __attribute__ ((__always_inline__))
+-vld3_f32 (const float32_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxnmqd_f64 (float64x2_t a)
+ {
+- float32x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v2sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 0);
+- ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 1);
+- ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v2df (a);
+ }
+
+-__extension__ static __inline int8x16x3_t __attribute__ ((__always_inline__))
+-vld3q_s8 (const int8_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxnms_f32 (float32x2_t a)
+ {
+- int8x16x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
+- ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
+- ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v2sf (a);
+ }
+
+-__extension__ static __inline poly8x16x3_t __attribute__ ((__always_inline__))
+-vld3q_p8 (const poly8_t * __a)
++/* vpmin */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmin_s8 (int8x8_t a, int8x8_t b)
+ {
+- poly8x16x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
+- ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
+- ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
+- return ret;
++ return __builtin_aarch64_sminpv8qi (a, b);
+ }
+
+-__extension__ static __inline int16x8x3_t __attribute__ ((__always_inline__))
+-vld3q_s16 (const int16_t * __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmin_s16 (int16x4_t a, int16x4_t b)
+ {
+- int16x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
+- ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
+- ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_sminpv4hi (a, b);
+ }
+
+-__extension__ static __inline poly16x8x3_t __attribute__ ((__always_inline__))
+-vld3q_p16 (const poly16_t * __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmin_s32 (int32x2_t a, int32x2_t b)
+ {
+- poly16x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
+- ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
+- ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_sminpv2si (a, b);
+ }
+
+-__extension__ static __inline int32x4x3_t __attribute__ ((__always_inline__))
+-vld3q_s32 (const int32_t * __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmin_u8 (uint8x8_t a, uint8x8_t b)
+ {
+- int32x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0);
+- ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1);
+- ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2);
+- return ret;
++ return (uint8x8_t) __builtin_aarch64_uminpv8qi ((int8x8_t) a,
++ (int8x8_t) b);
+ }
+
+-__extension__ static __inline int64x2x3_t __attribute__ ((__always_inline__))
+-vld3q_s64 (const int64_t * __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmin_u16 (uint16x4_t a, uint16x4_t b)
+ {
+- int64x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0);
+- ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1);
+- ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2);
+- return ret;
++ return (uint16x4_t) __builtin_aarch64_uminpv4hi ((int16x4_t) a,
++ (int16x4_t) b);
+ }
+
+-__extension__ static __inline uint8x16x3_t __attribute__ ((__always_inline__))
+-vld3q_u8 (const uint8_t * __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmin_u32 (uint32x2_t a, uint32x2_t b)
+ {
+- uint8x16x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
+- ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
+- ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
+- return ret;
++ return (uint32x2_t) __builtin_aarch64_uminpv2si ((int32x2_t) a,
++ (int32x2_t) b);
+ }
+
+-__extension__ static __inline uint16x8x3_t __attribute__ ((__always_inline__))
+-vld3q_u16 (const uint16_t * __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminq_s8 (int8x16_t a, int8x16_t b)
+ {
+- uint16x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
+- ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
+- ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_sminpv16qi (a, b);
+ }
+
+-__extension__ static __inline uint32x4x3_t __attribute__ ((__always_inline__))
+-vld3q_u32 (const uint32_t * __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminq_s16 (int16x8_t a, int16x8_t b)
+ {
+- uint32x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0);
+- ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1);
+- ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2);
+- return ret;
++ return __builtin_aarch64_sminpv8hi (a, b);
+ }
+
+-__extension__ static __inline uint64x2x3_t __attribute__ ((__always_inline__))
+-vld3q_u64 (const uint64_t * __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminq_s32 (int32x4_t a, int32x4_t b)
+ {
+- uint64x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0);
+- ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1);
+- ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2);
+- return ret;
++ return __builtin_aarch64_sminpv4si (a, b);
+ }
+
+-__extension__ static __inline float16x8x3_t __attribute__ ((__always_inline__))
+-vld3q_f16 (const float16_t * __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminq_u8 (uint8x16_t a, uint8x16_t b)
+ {
+- float16x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v8hf (__a);
+- ret.val[0] = __builtin_aarch64_get_qregciv8hf (__o, 0);
+- ret.val[1] = __builtin_aarch64_get_qregciv8hf (__o, 1);
+- ret.val[2] = __builtin_aarch64_get_qregciv8hf (__o, 2);
+- return ret;
++ return (uint8x16_t) __builtin_aarch64_uminpv16qi ((int8x16_t) a,
++ (int8x16_t) b);
+ }
+
+-__extension__ static __inline float32x4x3_t __attribute__ ((__always_inline__))
+-vld3q_f32 (const float32_t * __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminq_u16 (uint16x8_t a, uint16x8_t b)
+ {
+- float32x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v4sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 0);
+- ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 1);
+- ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 2);
+- return ret;
++ return (uint16x8_t) __builtin_aarch64_uminpv8hi ((int16x8_t) a,
++ (int16x8_t) b);
+ }
+
+-__extension__ static __inline float64x2x3_t __attribute__ ((__always_inline__))
+-vld3q_f64 (const float64_t * __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminq_u32 (uint32x4_t a, uint32x4_t b)
+ {
+- float64x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3v2df ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 0);
+- ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 1);
+- ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 2);
+- return ret;
++ return (uint32x4_t) __builtin_aarch64_uminpv4si ((int32x4_t) a,
++ (int32x4_t) b);
+ }
+
+-__extension__ static __inline int64x1x4_t __attribute__ ((__always_inline__))
+-vld4_s64 (const int64_t * __a)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmin_f32 (float32x2_t a, float32x2_t b)
+ {
+- int64x1x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 0);
+- ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 1);
+- ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 2);
+- ret.val[3] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 3);
+- return ret;
++ return __builtin_aarch64_smin_nanpv2sf (a, b);
+ }
+
+-__extension__ static __inline uint64x1x4_t __attribute__ ((__always_inline__))
+-vld4_u64 (const uint64_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminq_f32 (float32x4_t a, float32x4_t b)
+ {
+- uint64x1x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 0);
+- ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 1);
+- ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 2);
+- ret.val[3] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 3);
+- return ret;
++ return __builtin_aarch64_smin_nanpv4sf (a, b);
+ }
+
+-__extension__ static __inline float64x1x4_t __attribute__ ((__always_inline__))
+-vld4_f64 (const float64_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminq_f64 (float64x2_t a, float64x2_t b)
+ {
+- float64x1x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4df ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 0)};
+- ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 1)};
+- ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 2)};
+- ret.val[3] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 3)};
+- return ret;
++ return __builtin_aarch64_smin_nanpv2df (a, b);
+ }
+
+-__extension__ static __inline int8x8x4_t __attribute__ ((__always_inline__))
+-vld4_s8 (const int8_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminqd_f64 (float64x2_t a)
+ {
+- int8x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
+- ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
+- ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
+- ret.val[3] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smin_nan_scal_v2df (a);
+ }
+
+-__extension__ static __inline poly8x8x4_t __attribute__ ((__always_inline__))
+-vld4_p8 (const poly8_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmins_f32 (float32x2_t a)
+ {
+- poly8x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
+- ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
+- ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
+- ret.val[3] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smin_nan_scal_v2sf (a);
+ }
+
+-__extension__ static __inline int16x4x4_t __attribute__ ((__always_inline__))
+-vld4_s16 (const int16_t * __a)
++/* vpminnm */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminnm_f32 (float32x2_t a, float32x2_t b)
+ {
+- int16x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
+- ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
+- ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
+- ret.val[3] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
+- return ret;
++ return __builtin_aarch64_sminpv2sf (a, b);
+ }
+
+-__extension__ static __inline poly16x4x4_t __attribute__ ((__always_inline__))
+-vld4_p16 (const poly16_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminnmq_f32 (float32x4_t a, float32x4_t b)
+ {
+- poly16x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
+- ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
+- ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
+- ret.val[3] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
+- return ret;
++ return __builtin_aarch64_sminpv4sf (a, b);
+ }
+
+-__extension__ static __inline int32x2x4_t __attribute__ ((__always_inline__))
+-vld4_s32 (const int32_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminnmq_f64 (float64x2_t a, float64x2_t b)
+ {
+- int32x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0);
+- ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1);
+- ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2);
+- ret.val[3] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3);
+- return ret;
++ return __builtin_aarch64_sminpv2df (a, b);
+ }
+
+-__extension__ static __inline uint8x8x4_t __attribute__ ((__always_inline__))
+-vld4_u8 (const uint8_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminnmqd_f64 (float64x2_t a)
+ {
+- uint8x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
+- ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
+- ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
+- ret.val[3] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v2df (a);
+ }
+
+-__extension__ static __inline uint16x4x4_t __attribute__ ((__always_inline__))
+-vld4_u16 (const uint16_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminnms_f32 (float32x2_t a)
+ {
+- uint16x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
+- ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
+- ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
+- ret.val[3] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v2sf (a);
+ }
+
+-__extension__ static __inline uint32x2x4_t __attribute__ ((__always_inline__))
+-vld4_u32 (const uint32_t * __a)
++/* vmaxnm */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnm_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- uint32x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0);
+- ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1);
+- ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2);
+- ret.val[3] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3);
+- return ret;
++ return __builtin_aarch64_fmaxv2sf (__a, __b);
+ }
+
+-__extension__ static __inline float16x4x4_t __attribute__ ((__always_inline__))
+-vld4_f16 (const float16_t * __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnm_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- float16x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v4hf (__a);
+- ret.val[0] = __builtin_aarch64_get_dregxiv4hf (__o, 0);
+- ret.val[1] = __builtin_aarch64_get_dregxiv4hf (__o, 1);
+- ret.val[2] = __builtin_aarch64_get_dregxiv4hf (__o, 2);
+- ret.val[3] = __builtin_aarch64_get_dregxiv4hf (__o, 3);
+- return ret;
++ return (float64x1_t)
++ { __builtin_aarch64_fmaxdf (vget_lane_f64 (__a, 0),
++ vget_lane_f64 (__b, 0)) };
+ }
+
+-__extension__ static __inline float32x2x4_t __attribute__ ((__always_inline__))
+-vld4_f32 (const float32_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnmq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- float32x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v2sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 0);
+- ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 1);
+- ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 2);
+- ret.val[3] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 3);
+- return ret;
++ return __builtin_aarch64_fmaxv4sf (__a, __b);
+ }
+
+-__extension__ static __inline int8x16x4_t __attribute__ ((__always_inline__))
+-vld4q_s8 (const int8_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnmq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- int8x16x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
+- ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
+- ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
+- ret.val[3] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
+- return ret;
++ return __builtin_aarch64_fmaxv2df (__a, __b);
+ }
+
+-__extension__ static __inline poly8x16x4_t __attribute__ ((__always_inline__))
+-vld4q_p8 (const poly8_t * __a)
++/* vmaxv */
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxv_f32 (float32x2_t __a)
+ {
+- poly8x16x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
+- ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
+- ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
+- ret.val[3] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smax_nan_scal_v2sf (__a);
+ }
+
+-__extension__ static __inline int16x8x4_t __attribute__ ((__always_inline__))
+-vld4q_s16 (const int16_t * __a)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxv_s8 (int8x8_t __a)
+ {
+- int16x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
+- ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
+- ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
+- ret.val[3] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v8qi (__a);
+ }
+
+-__extension__ static __inline poly16x8x4_t __attribute__ ((__always_inline__))
+-vld4q_p16 (const poly16_t * __a)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxv_s16 (int16x4_t __a)
+ {
+- poly16x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
+- ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
+- ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
+- ret.val[3] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v4hi (__a);
+ }
+
+-__extension__ static __inline int32x4x4_t __attribute__ ((__always_inline__))
+-vld4q_s32 (const int32_t * __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxv_s32 (int32x2_t __a)
+ {
+- int32x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0);
+- ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1);
+- ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2);
+- ret.val[3] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v2si (__a);
+ }
+
+-__extension__ static __inline int64x2x4_t __attribute__ ((__always_inline__))
+-vld4q_s64 (const int64_t * __a)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxv_u8 (uint8x8_t __a)
+ {
+- int64x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0);
+- ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1);
+- ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2);
+- ret.val[3] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_umax_scal_v8qi_uu (__a);
+ }
+
+-__extension__ static __inline uint8x16x4_t __attribute__ ((__always_inline__))
+-vld4q_u8 (const uint8_t * __a)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxv_u16 (uint16x4_t __a)
+ {
+- uint8x16x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
+- ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
+- ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
+- ret.val[3] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_umax_scal_v4hi_uu (__a);
+ }
+
+-__extension__ static __inline uint16x8x4_t __attribute__ ((__always_inline__))
+-vld4q_u16 (const uint16_t * __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxv_u32 (uint32x2_t __a)
+ {
+- uint16x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
+- ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
+- ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
+- ret.val[3] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_umax_scal_v2si_uu (__a);
+ }
+
+-__extension__ static __inline uint32x4x4_t __attribute__ ((__always_inline__))
+-vld4q_u32 (const uint32_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxvq_f32 (float32x4_t __a)
+ {
+- uint32x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0);
+- ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1);
+- ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2);
+- ret.val[3] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smax_nan_scal_v4sf (__a);
+ }
+
+-__extension__ static __inline uint64x2x4_t __attribute__ ((__always_inline__))
+-vld4q_u64 (const uint64_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxvq_f64 (float64x2_t __a)
+ {
+- uint64x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0);
+- ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1);
+- ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2);
+- ret.val[3] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smax_nan_scal_v2df (__a);
+ }
+
+-__extension__ static __inline float16x8x4_t __attribute__ ((__always_inline__))
+-vld4q_f16 (const float16_t * __a)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxvq_s8 (int8x16_t __a)
+ {
+- float16x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v8hf (__a);
+- ret.val[0] = __builtin_aarch64_get_qregxiv8hf (__o, 0);
+- ret.val[1] = __builtin_aarch64_get_qregxiv8hf (__o, 1);
+- ret.val[2] = __builtin_aarch64_get_qregxiv8hf (__o, 2);
+- ret.val[3] = __builtin_aarch64_get_qregxiv8hf (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v16qi (__a);
+ }
+
+-__extension__ static __inline float32x4x4_t __attribute__ ((__always_inline__))
+-vld4q_f32 (const float32_t * __a)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxvq_s16 (int16x8_t __a)
+ {
+- float32x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v4sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 0);
+- ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 1);
+- ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 2);
+- ret.val[3] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v8hi (__a);
+ }
+
+-__extension__ static __inline float64x2x4_t __attribute__ ((__always_inline__))
+-vld4q_f64 (const float64_t * __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxvq_s32 (int32x4_t __a)
+ {
+- float64x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4v2df ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 0);
+- ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 1);
+- ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 2);
+- ret.val[3] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 3);
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v4si (__a);
+ }
+
+-/* vldn_dup */
+-
+-__extension__ static __inline int8x8x2_t __attribute__ ((__always_inline__))
+-vld2_dup_s8 (const int8_t * __a)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxvq_u8 (uint8x16_t __a)
+ {
+- int8x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
+- ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
+- return ret;
++ return __builtin_aarch64_reduc_umax_scal_v16qi_uu (__a);
+ }
+
+-__extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__))
+-vld2_dup_s16 (const int16_t * __a)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxvq_u16 (uint16x8_t __a)
+ {
+- int16x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
+- ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
+- return ret;
++ return __builtin_aarch64_reduc_umax_scal_v8hi_uu (__a);
+ }
+
+-__extension__ static __inline int32x2x2_t __attribute__ ((__always_inline__))
+-vld2_dup_s32 (const int32_t * __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxvq_u32 (uint32x4_t __a)
+ {
+- int32x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0);
+- ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1);
+- return ret;
++ return __builtin_aarch64_reduc_umax_scal_v4si_uu (__a);
+ }
+
+-__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
+-vld2_dup_f16 (const float16_t * __a)
++/* vmaxnmv */
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnmv_f32 (float32x2_t __a)
+ {
+- float16x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv4hf ((const __builtin_aarch64_simd_hf *) __a);
+- ret.val[0] = __builtin_aarch64_get_dregoiv4hf (__o, 0);
+- ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregoiv4hf (__o, 1);
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v2sf (__a);
+ }
+
+-__extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
+-vld2_dup_f32 (const float32_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnmvq_f32 (float32x4_t __a)
+ {
+- float32x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv2sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 0);
+- ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregoiv2sf (__o, 1);
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v4sf (__a);
+ }
+
+-__extension__ static __inline float64x1x2_t __attribute__ ((__always_inline__))
+-vld2_dup_f64 (const float64_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnmvq_f64 (float64x2_t __a)
+ {
+- float64x1x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rdf ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 0)};
+- ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregoidf (__o, 1)};
+- return ret;
++ return __builtin_aarch64_reduc_smax_scal_v2df (__a);
+ }
+
+-__extension__ static __inline uint8x8x2_t __attribute__ ((__always_inline__))
+-vld2_dup_u8 (const uint8_t * __a)
++/* vmin */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmin_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- uint8x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
+- ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
+- return ret;
++ return __builtin_aarch64_smin_nanv2sf (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4x2_t __attribute__ ((__always_inline__))
+-vld2_dup_u16 (const uint16_t * __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmin_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- uint16x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
+- ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
+- return ret;
++ return (float64x1_t)
++ { __builtin_aarch64_smin_nandf (vget_lane_f64 (__a, 0),
++ vget_lane_f64 (__b, 0)) };
+ }
+
+-__extension__ static __inline uint32x2x2_t __attribute__ ((__always_inline__))
+-vld2_dup_u32 (const uint32_t * __a)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmin_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- uint32x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 0);
+- ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregoiv2si (__o, 1);
+- return ret;
++ return __builtin_aarch64_sminv8qi (__a, __b);
+ }
+
+-__extension__ static __inline poly8x8x2_t __attribute__ ((__always_inline__))
+-vld2_dup_p8 (const poly8_t * __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmin_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- poly8x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 0);
+- ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregoiv8qi (__o, 1);
+- return ret;
++ return __builtin_aarch64_sminv4hi (__a, __b);
+ }
+
+-__extension__ static __inline poly16x4x2_t __attribute__ ((__always_inline__))
+-vld2_dup_p16 (const poly16_t * __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmin_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- poly16x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 0);
+- ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregoiv4hi (__o, 1);
+- return ret;
++ return __builtin_aarch64_sminv2si (__a, __b);
+ }
+
+-__extension__ static __inline int64x1x2_t __attribute__ ((__always_inline__))
+-vld2_dup_s64 (const int64_t * __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmin_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- int64x1x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rdi ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 0);
+- ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregoidi (__o, 1);
+- return ret;
++ return (uint8x8_t) __builtin_aarch64_uminv8qi ((int8x8_t) __a,
++ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline uint64x1x2_t __attribute__ ((__always_inline__))
+-vld2_dup_u64 (const uint64_t * __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmin_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+- uint64x1x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rdi ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 0);
+- ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregoidi (__o, 1);
+- return ret;
++ return (uint16x4_t) __builtin_aarch64_uminv4hi ((int16x4_t) __a,
++ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline int8x16x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_s8 (const int8_t * __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmin_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- int8x16x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
+- ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
+- return ret;
++ return (uint32x2_t) __builtin_aarch64_uminv2si ((int32x2_t) __a,
++ (int32x2_t) __b);
++}
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminq_f32 (float32x4_t __a, float32x4_t __b)
++{
++ return __builtin_aarch64_smin_nanv4sf (__a, __b);
+ }
+
+-__extension__ static __inline poly8x16x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_p8 (const poly8_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- poly8x16x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
+- ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
+- return ret;
++ return __builtin_aarch64_smin_nanv2df (__a, __b);
+ }
+
+-__extension__ static __inline int16x8x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_s16 (const int16_t * __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- int16x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
+- ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
+- return ret;
++ return __builtin_aarch64_sminv16qi (__a, __b);
+ }
+
+-__extension__ static __inline poly16x8x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_p16 (const poly16_t * __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- poly16x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
+- ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
+- return ret;
++ return __builtin_aarch64_sminv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_s32 (const int32_t * __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- int32x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0);
+- ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1);
+- return ret;
++ return __builtin_aarch64_sminv4si (__a, __b);
+ }
+
+-__extension__ static __inline int64x2x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_s64 (const int64_t * __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+- int64x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0);
+- ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1);
+- return ret;
++ return (uint8x16_t) __builtin_aarch64_uminv16qi ((int8x16_t) __a,
++ (int8x16_t) __b);
+ }
+
+-__extension__ static __inline uint8x16x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_u8 (const uint8_t * __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+- uint8x16x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 0);
+- ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregoiv16qi (__o, 1);
+- return ret;
++ return (uint16x8_t) __builtin_aarch64_uminv8hi ((int16x8_t) __a,
++ (int16x8_t) __b);
+ }
+
+-__extension__ static __inline uint16x8x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_u16 (const uint16_t * __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+- uint16x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 0);
+- ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregoiv8hi (__o, 1);
+- return ret;
++ return (uint32x4_t) __builtin_aarch64_uminv4si ((int32x4_t) __a,
++ (int32x4_t) __b);
+ }
+
+-__extension__ static __inline uint32x4x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_u32 (const uint32_t * __a)
++/* vminnm */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnm_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- uint32x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 0);
+- ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregoiv4si (__o, 1);
+- return ret;
++ return __builtin_aarch64_fminv2sf (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_u64 (const uint64_t * __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnm_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- uint64x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 0);
+- ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregoiv2di (__o, 1);
+- return ret;
++ return (float64x1_t)
++ { __builtin_aarch64_fmindf (vget_lane_f64 (__a, 0),
++ vget_lane_f64 (__b, 0)) };
+ }
+
+-__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_f16 (const float16_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnmq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- float16x8x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv8hf ((const __builtin_aarch64_simd_hf *) __a);
+- ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregoiv8hf (__o, 0);
+- ret.val[1] = __builtin_aarch64_get_qregoiv8hf (__o, 1);
+- return ret;
++ return __builtin_aarch64_fminv4sf (__a, __b);
+ }
+
+-__extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_f32 (const float32_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnmq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- float32x4x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv4sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 0);
+- ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregoiv4sf (__o, 1);
+- return ret;
++ return __builtin_aarch64_fminv2df (__a, __b);
+ }
+
+-__extension__ static __inline float64x2x2_t __attribute__ ((__always_inline__))
+-vld2q_dup_f64 (const float64_t * __a)
++/* vminv */
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminv_f32 (float32x2_t __a)
+ {
+- float64x2x2_t ret;
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_ld2rv2df ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 0);
+- ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregoiv2df (__o, 1);
+- return ret;
++ return __builtin_aarch64_reduc_smin_nan_scal_v2sf (__a);
+ }
+
+-__extension__ static __inline int64x1x3_t __attribute__ ((__always_inline__))
+-vld3_dup_s64 (const int64_t * __a)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminv_s8 (int8x8_t __a)
+ {
+- int64x1x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rdi ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 0);
+- ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 1);
+- ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregcidi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v8qi (__a);
+ }
+
+-__extension__ static __inline uint64x1x3_t __attribute__ ((__always_inline__))
+-vld3_dup_u64 (const uint64_t * __a)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminv_s16 (int16x4_t __a)
+ {
+- uint64x1x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rdi ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 0);
+- ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 1);
+- ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregcidi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v4hi (__a);
+ }
+
+-__extension__ static __inline float64x1x3_t __attribute__ ((__always_inline__))
+-vld3_dup_f64 (const float64_t * __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminv_s32 (int32x2_t __a)
+ {
+- float64x1x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rdf ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 0)};
+- ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 1)};
+- ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregcidf (__o, 2)};
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v2si (__a);
+ }
+
+-__extension__ static __inline int8x8x3_t __attribute__ ((__always_inline__))
+-vld3_dup_s8 (const int8_t * __a)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminv_u8 (uint8x8_t __a)
+ {
+- int8x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
+- ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
+- ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_umin_scal_v8qi_uu (__a);
+ }
+
+-__extension__ static __inline poly8x8x3_t __attribute__ ((__always_inline__))
+-vld3_dup_p8 (const poly8_t * __a)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminv_u16 (uint16x4_t __a)
+ {
+- poly8x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
+- ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
+- ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_umin_scal_v4hi_uu (__a);
+ }
+
+-__extension__ static __inline int16x4x3_t __attribute__ ((__always_inline__))
+-vld3_dup_s16 (const int16_t * __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminv_u32 (uint32x2_t __a)
+ {
+- int16x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
+- ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
+- ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_umin_scal_v2si_uu (__a);
+ }
+
+-__extension__ static __inline poly16x4x3_t __attribute__ ((__always_inline__))
+-vld3_dup_p16 (const poly16_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminvq_f32 (float32x4_t __a)
+ {
+- poly16x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
+- ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
+- ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smin_nan_scal_v4sf (__a);
+ }
+
+-__extension__ static __inline int32x2x3_t __attribute__ ((__always_inline__))
+-vld3_dup_s32 (const int32_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminvq_f64 (float64x2_t __a)
+ {
+- int32x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0);
+- ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1);
+- ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smin_nan_scal_v2df (__a);
+ }
+
+-__extension__ static __inline uint8x8x3_t __attribute__ ((__always_inline__))
+-vld3_dup_u8 (const uint8_t * __a)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminvq_s8 (int8x16_t __a)
+ {
+- uint8x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0);
+- ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1);
+- ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v16qi (__a);
+ }
+
+-__extension__ static __inline uint16x4x3_t __attribute__ ((__always_inline__))
+-vld3_dup_u16 (const uint16_t * __a)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminvq_s16 (int16x8_t __a)
+ {
+- uint16x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0);
+- ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1);
+- ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v8hi (__a);
+ }
+
+-__extension__ static __inline uint32x2x3_t __attribute__ ((__always_inline__))
+-vld3_dup_u32 (const uint32_t * __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminvq_s32 (int32x4_t __a)
+ {
+- uint32x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0);
+- ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1);
+- ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v4si (__a);
+ }
+
+-__extension__ static __inline float16x4x3_t __attribute__ ((__always_inline__))
+-vld3_dup_f16 (const float16_t * __a)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminvq_u8 (uint8x16_t __a)
+ {
+- float16x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv4hf ((const __builtin_aarch64_simd_hf *) __a);
+- ret.val[0] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 0);
+- ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 1);
+- ret.val[2] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_umin_scal_v16qi_uu (__a);
+ }
+
+-__extension__ static __inline float32x2x3_t __attribute__ ((__always_inline__))
+-vld3_dup_f32 (const float32_t * __a)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminvq_u16 (uint16x8_t __a)
+ {
+- float32x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv2sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 0);
+- ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 1);
+- ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_umin_scal_v8hi_uu (__a);
+ }
+
+-__extension__ static __inline int8x16x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_s8 (const int8_t * __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminvq_u32 (uint32x4_t __a)
+ {
+- int8x16x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
+- ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
+- ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_umin_scal_v4si_uu (__a);
+ }
+
+-__extension__ static __inline poly8x16x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_p8 (const poly8_t * __a)
++/* vminnmv */
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnmv_f32 (float32x2_t __a)
+ {
+- poly8x16x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
+- ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
+- ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v2sf (__a);
+ }
+
+-__extension__ static __inline int16x8x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_s16 (const int16_t * __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnmvq_f32 (float32x4_t __a)
+ {
+- int16x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
+- ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
+- ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v4sf (__a);
+ }
+
+-__extension__ static __inline poly16x8x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_p16 (const poly16_t * __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnmvq_f64 (float64x2_t __a)
+ {
+- poly16x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
+- ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
+- ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
+- return ret;
++ return __builtin_aarch64_reduc_smin_scal_v2df (__a);
+ }
+
+-__extension__ static __inline int32x4x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_s32 (const int32_t * __a)
++/* vmla */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_f32 (float32x2_t a, float32x2_t b, float32x2_t c)
+ {
+- int32x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0);
+- ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1);
+- ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2);
+- return ret;
++ return a + b * c;
+ }
+
+-__extension__ static __inline int64x2x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_s64 (const int64_t * __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c)
+ {
+- int64x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0);
+- ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1);
+- ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2);
+- return ret;
++ return __a + __b * __c;
+ }
+
+-__extension__ static __inline uint8x16x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_u8 (const uint8_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_f32 (float32x4_t a, float32x4_t b, float32x4_t c)
+ {
+- uint8x16x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0);
+- ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1);
+- ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2);
+- return ret;
++ return a + b * c;
+ }
+
+-__extension__ static __inline uint16x8x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_u16 (const uint16_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_f64 (float64x2_t a, float64x2_t b, float64x2_t c)
+ {
+- uint16x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0);
+- ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1);
+- ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2);
+- return ret;
++ return a + b * c;
+ }
+
+-__extension__ static __inline uint32x4x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_u32 (const uint32_t * __a)
++/* vmla_lane */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_lane_f32 (float32x2_t __a, float32x2_t __b,
++ float32x2_t __c, const int __lane)
+ {
+- uint32x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0);
+- ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1);
+- ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline uint64x2x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_u64 (const uint64_t * __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_lane_s16 (int16x4_t __a, int16x4_t __b,
++ int16x4_t __c, const int __lane)
+ {
+- uint64x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0);
+- ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1);
+- ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline float16x8x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_f16 (const float16_t * __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_lane_s32 (int32x2_t __a, int32x2_t __b,
++ int32x2_t __c, const int __lane)
+ {
+- float16x8x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv8hf ((const __builtin_aarch64_simd_hf *) __a);
+- ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 0);
+- ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 1);
+- ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 2);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline float32x4x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_f32 (const float32_t * __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_lane_u16 (uint16x4_t __a, uint16x4_t __b,
++ uint16x4_t __c, const int __lane)
+ {
+- float32x4x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv4sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 0);
+- ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 1);
+- ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 2);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline float64x2x3_t __attribute__ ((__always_inline__))
+-vld3q_dup_f64 (const float64_t * __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_lane_u32 (uint32x2_t __a, uint32x2_t __b,
++ uint32x2_t __c, const int __lane)
+ {
+- float64x2x3_t ret;
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_ld3rv2df ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 0);
+- ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 1);
+- ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 2);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline int64x1x4_t __attribute__ ((__always_inline__))
+-vld4_dup_s64 (const int64_t * __a)
++/* vmla_laneq */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_laneq_f32 (float32x2_t __a, float32x2_t __b,
++ float32x4_t __c, const int __lane)
+ {
+- int64x1x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rdi ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 0);
+- ret.val[1] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 1);
+- ret.val[2] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 2);
+- ret.val[3] = (int64x1_t) __builtin_aarch64_get_dregxidi (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline uint64x1x4_t __attribute__ ((__always_inline__))
+-vld4_dup_u64 (const uint64_t * __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_laneq_s16 (int16x4_t __a, int16x4_t __b,
++ int16x8_t __c, const int __lane)
+ {
+- uint64x1x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rdi ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 0);
+- ret.val[1] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 1);
+- ret.val[2] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 2);
+- ret.val[3] = (uint64x1_t) __builtin_aarch64_get_dregxidi (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline float64x1x4_t __attribute__ ((__always_inline__))
+-vld4_dup_f64 (const float64_t * __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_laneq_s32 (int32x2_t __a, int32x2_t __b,
++ int32x4_t __c, const int __lane)
+ {
+- float64x1x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rdf ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 0)};
+- ret.val[1] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 1)};
+- ret.val[2] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 2)};
+- ret.val[3] = (float64x1_t) {__builtin_aarch64_get_dregxidf (__o, 3)};
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline int8x8x4_t __attribute__ ((__always_inline__))
+-vld4_dup_s8 (const int8_t * __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_laneq_u16 (uint16x4_t __a, uint16x4_t __b,
++ uint16x8_t __c, const int __lane)
+ {
+- int8x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
+- ret.val[1] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
+- ret.val[2] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
+- ret.val[3] = (int8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline poly8x8x4_t __attribute__ ((__always_inline__))
+-vld4_dup_p8 (const poly8_t * __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmla_laneq_u32 (uint32x2_t __a, uint32x2_t __b,
++ uint32x4_t __c, const int __lane)
+ {
+- poly8x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
+- ret.val[1] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
+- ret.val[2] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
+- ret.val[3] = (poly8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline int16x4x4_t __attribute__ ((__always_inline__))
+-vld4_dup_s16 (const int16_t * __a)
++/* vmlaq_lane */
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_lane_f32 (float32x4_t __a, float32x4_t __b,
++ float32x2_t __c, const int __lane)
+ {
+- int16x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
+- ret.val[1] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
+- ret.val[2] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
+- ret.val[3] = (int16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline poly16x4x4_t __attribute__ ((__always_inline__))
+-vld4_dup_p16 (const poly16_t * __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_lane_s16 (int16x8_t __a, int16x8_t __b,
++ int16x4_t __c, const int __lane)
+ {
+- poly16x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
+- ret.val[1] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
+- ret.val[2] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
+- ret.val[3] = (poly16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline int32x2x4_t __attribute__ ((__always_inline__))
+-vld4_dup_s32 (const int32_t * __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_lane_s32 (int32x4_t __a, int32x4_t __b,
++ int32x2_t __c, const int __lane)
+ {
+- int32x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0);
+- ret.val[1] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1);
+- ret.val[2] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2);
+- ret.val[3] = (int32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline uint8x8x4_t __attribute__ ((__always_inline__))
+-vld4_dup_u8 (const uint8_t * __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_lane_u16 (uint16x8_t __a, uint16x8_t __b,
++ uint16x4_t __c, const int __lane)
+ {
+- uint8x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv8qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 0);
+- ret.val[1] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 1);
+- ret.val[2] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 2);
+- ret.val[3] = (uint8x8_t) __builtin_aarch64_get_dregxiv8qi (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline uint16x4x4_t __attribute__ ((__always_inline__))
+-vld4_dup_u16 (const uint16_t * __a)
+-{
+- uint16x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv4hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 0);
+- ret.val[1] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 1);
+- ret.val[2] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 2);
+- ret.val[3] = (uint16x4_t) __builtin_aarch64_get_dregxiv4hi (__o, 3);
+- return ret;
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_lane_u32 (uint32x4_t __a, uint32x4_t __b,
++ uint32x2_t __c, const int __lane)
++{
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline uint32x2x4_t __attribute__ ((__always_inline__))
+-vld4_dup_u32 (const uint32_t * __a)
++ /* vmlaq_laneq */
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_laneq_f32 (float32x4_t __a, float32x4_t __b,
++ float32x4_t __c, const int __lane)
+ {
+- uint32x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv2si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 0);
+- ret.val[1] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 1);
+- ret.val[2] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 2);
+- ret.val[3] = (uint32x2_t) __builtin_aarch64_get_dregxiv2si (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline float16x4x4_t __attribute__ ((__always_inline__))
+-vld4_dup_f16 (const float16_t * __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_laneq_s16 (int16x8_t __a, int16x8_t __b,
++ int16x8_t __c, const int __lane)
+ {
+- float16x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv4hf ((const __builtin_aarch64_simd_hf *) __a);
+- ret.val[0] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 0);
+- ret.val[1] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 1);
+- ret.val[2] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 2);
+- ret.val[3] = (float16x4_t) __builtin_aarch64_get_dregxiv4hf (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline float32x2x4_t __attribute__ ((__always_inline__))
+-vld4_dup_f32 (const float32_t * __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_laneq_s32 (int32x4_t __a, int32x4_t __b,
++ int32x4_t __c, const int __lane)
+ {
+- float32x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv2sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 0);
+- ret.val[1] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 1);
+- ret.val[2] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 2);
+- ret.val[3] = (float32x2_t) __builtin_aarch64_get_dregxiv2sf (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline int8x16x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_s8 (const int8_t * __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_laneq_u16 (uint16x8_t __a, uint16x8_t __b,
++ uint16x8_t __c, const int __lane)
+ {
+- int8x16x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
+- ret.val[1] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
+- ret.val[2] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
+- ret.val[3] = (int8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline poly8x16x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_p8 (const poly8_t * __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlaq_laneq_u32 (uint32x4_t __a, uint32x4_t __b,
++ uint32x4_t __c, const int __lane)
+ {
+- poly8x16x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
+- ret.val[1] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
+- ret.val[2] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
+- ret.val[3] = (poly8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
+- return ret;
++ return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline int16x8x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_s16 (const int16_t * __a)
++/* vmls */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_f32 (float32x2_t a, float32x2_t b, float32x2_t c)
+ {
+- int16x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
+- ret.val[1] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
+- ret.val[2] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
+- ret.val[3] = (int16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
+- return ret;
++ return a - b * c;
+ }
+
+-__extension__ static __inline poly16x8x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_p16 (const poly16_t * __a)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c)
+ {
+- poly16x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
+- ret.val[1] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
+- ret.val[2] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
+- ret.val[3] = (poly16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
+- return ret;
++ return __a - __b * __c;
+ }
+
+-__extension__ static __inline int32x4x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_s32 (const int32_t * __a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_f32 (float32x4_t a, float32x4_t b, float32x4_t c)
+ {
+- int32x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0);
+- ret.val[1] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1);
+- ret.val[2] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2);
+- ret.val[3] = (int32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3);
+- return ret;
++ return a - b * c;
+ }
+
+-__extension__ static __inline int64x2x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_s64 (const int64_t * __a)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_f64 (float64x2_t a, float64x2_t b, float64x2_t c)
+ {
+- int64x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0);
+- ret.val[1] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1);
+- ret.val[2] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2);
+- ret.val[3] = (int64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3);
+- return ret;
++ return a - b * c;
+ }
+
+-__extension__ static __inline uint8x16x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_u8 (const uint8_t * __a)
++/* vmls_lane */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_lane_f32 (float32x2_t __a, float32x2_t __b,
++ float32x2_t __c, const int __lane)
+ {
+- uint8x16x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv16qi ((const __builtin_aarch64_simd_qi *) __a);
+- ret.val[0] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 0);
+- ret.val[1] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 1);
+- ret.val[2] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 2);
+- ret.val[3] = (uint8x16_t) __builtin_aarch64_get_qregxiv16qi (__o, 3);
+- return ret;
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline uint16x8x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_u16 (const uint16_t * __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_lane_s16 (int16x4_t __a, int16x4_t __b,
++ int16x4_t __c, const int __lane)
+ {
+- uint16x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv8hi ((const __builtin_aarch64_simd_hi *) __a);
+- ret.val[0] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 0);
+- ret.val[1] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 1);
+- ret.val[2] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 2);
+- ret.val[3] = (uint16x8_t) __builtin_aarch64_get_qregxiv8hi (__o, 3);
+- return ret;
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline uint32x4x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_u32 (const uint32_t * __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_lane_s32 (int32x2_t __a, int32x2_t __b,
++ int32x2_t __c, const int __lane)
+ {
+- uint32x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv4si ((const __builtin_aarch64_simd_si *) __a);
+- ret.val[0] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 0);
+- ret.val[1] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 1);
+- ret.val[2] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 2);
+- ret.val[3] = (uint32x4_t) __builtin_aarch64_get_qregxiv4si (__o, 3);
+- return ret;
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline uint64x2x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_u64 (const uint64_t * __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_lane_u16 (uint16x4_t __a, uint16x4_t __b,
++ uint16x4_t __c, const int __lane)
+ {
+- uint64x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv2di ((const __builtin_aarch64_simd_di *) __a);
+- ret.val[0] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 0);
+- ret.val[1] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 1);
+- ret.val[2] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 2);
+- ret.val[3] = (uint64x2_t) __builtin_aarch64_get_qregxiv2di (__o, 3);
+- return ret;
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline float16x8x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_f16 (const float16_t * __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_lane_u32 (uint32x2_t __a, uint32x2_t __b,
++ uint32x2_t __c, const int __lane)
+ {
+- float16x8x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv8hf ((const __builtin_aarch64_simd_hf *) __a);
+- ret.val[0] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 0);
+- ret.val[1] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 1);
+- ret.val[2] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 2);
+- ret.val[3] = (float16x8_t) __builtin_aarch64_get_qregxiv8hf (__o, 3);
+- return ret;
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline float32x4x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_f32 (const float32_t * __a)
++/* vmls_laneq */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_laneq_f32 (float32x2_t __a, float32x2_t __b,
++ float32x4_t __c, const int __lane)
+ {
+- float32x4x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv4sf ((const __builtin_aarch64_simd_sf *) __a);
+- ret.val[0] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 0);
+- ret.val[1] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 1);
+- ret.val[2] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 2);
+- ret.val[3] = (float32x4_t) __builtin_aarch64_get_qregxiv4sf (__o, 3);
+- return ret;
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__extension__ static __inline float64x2x4_t __attribute__ ((__always_inline__))
+-vld4q_dup_f64 (const float64_t * __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_laneq_s16 (int16x4_t __a, int16x4_t __b,
++ int16x8_t __c, const int __lane)
+ {
+- float64x2x4_t ret;
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_ld4rv2df ((const __builtin_aarch64_simd_df *) __a);
+- ret.val[0] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 0);
+- ret.val[1] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 1);
+- ret.val[2] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 2);
+- ret.val[3] = (float64x2_t) __builtin_aarch64_get_qregxiv2df (__o, 3);
+- return ret;
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-/* vld2_lane */
+-
+-#define __LD2_LANE_FUNC(intype, vectype, largetype, ptrtype, mode, \
+- qmode, ptrmode, funcsuffix, signedtype) \
+-__extension__ static __inline intype __attribute__ ((__always_inline__)) \
+-vld2_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
+-{ \
+- __builtin_aarch64_simd_oi __o; \
+- largetype __temp; \
+- __temp.val[0] = \
+- vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \
+- __temp.val[1] = \
+- vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \
+- __o = __builtin_aarch64_set_qregoi##qmode (__o, \
+- (signedtype) __temp.val[0], \
+- 0); \
+- __o = __builtin_aarch64_set_qregoi##qmode (__o, \
+- (signedtype) __temp.val[1], \
+- 1); \
+- __o = __builtin_aarch64_ld2_lane##mode ( \
+- (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
+- __b.val[0] = (vectype) __builtin_aarch64_get_dregoidi (__o, 0); \
+- __b.val[1] = (vectype) __builtin_aarch64_get_dregoidi (__o, 1); \
+- return __b; \
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_laneq_s32 (int32x2_t __a, int32x2_t __b,
++ int32x4_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__LD2_LANE_FUNC (float16x4x2_t, float16x4_t, float16x8x2_t, float16_t, v4hf,
+- v8hf, hf, f16, float16x8_t)
+-__LD2_LANE_FUNC (float32x2x2_t, float32x2_t, float32x4x2_t, float32_t, v2sf, v4sf,
+- sf, f32, float32x4_t)
+-__LD2_LANE_FUNC (float64x1x2_t, float64x1_t, float64x2x2_t, float64_t, df, v2df,
+- df, f64, float64x2_t)
+-__LD2_LANE_FUNC (poly8x8x2_t, poly8x8_t, poly8x16x2_t, poly8_t, v8qi, v16qi, qi, p8,
+- int8x16_t)
+-__LD2_LANE_FUNC (poly16x4x2_t, poly16x4_t, poly16x8x2_t, poly16_t, v4hi, v8hi, hi,
+- p16, int16x8_t)
+-__LD2_LANE_FUNC (int8x8x2_t, int8x8_t, int8x16x2_t, int8_t, v8qi, v16qi, qi, s8,
+- int8x16_t)
+-__LD2_LANE_FUNC (int16x4x2_t, int16x4_t, int16x8x2_t, int16_t, v4hi, v8hi, hi, s16,
+- int16x8_t)
+-__LD2_LANE_FUNC (int32x2x2_t, int32x2_t, int32x4x2_t, int32_t, v2si, v4si, si, s32,
+- int32x4_t)
+-__LD2_LANE_FUNC (int64x1x2_t, int64x1_t, int64x2x2_t, int64_t, di, v2di, di, s64,
+- int64x2_t)
+-__LD2_LANE_FUNC (uint8x8x2_t, uint8x8_t, uint8x16x2_t, uint8_t, v8qi, v16qi, qi, u8,
+- int8x16_t)
+-__LD2_LANE_FUNC (uint16x4x2_t, uint16x4_t, uint16x8x2_t, uint16_t, v4hi, v8hi, hi,
+- u16, int16x8_t)
+-__LD2_LANE_FUNC (uint32x2x2_t, uint32x2_t, uint32x4x2_t, uint32_t, v2si, v4si, si,
+- u32, int32x4_t)
+-__LD2_LANE_FUNC (uint64x1x2_t, uint64x1_t, uint64x2x2_t, uint64_t, di, v2di, di,
+- u64, int64x2_t)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_laneq_u16 (uint16x4_t __a, uint16x4_t __b,
++ uint16x8_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++}
+
+-#undef __LD2_LANE_FUNC
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmls_laneq_u32 (uint32x2_t __a, uint32x2_t __b,
++ uint32x4_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++}
+
+-/* vld2q_lane */
++/* vmlsq_lane */
+
+-#define __LD2_LANE_FUNC(intype, vtype, ptrtype, mode, ptrmode, funcsuffix) \
+-__extension__ static __inline intype __attribute__ ((__always_inline__)) \
+-vld2q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
+-{ \
+- __builtin_aarch64_simd_oi __o; \
+- intype ret; \
+- __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[0], 0); \
+- __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) __b.val[1], 1); \
+- __o = __builtin_aarch64_ld2_lane##mode ( \
+- (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
+- ret.val[0] = (vtype) __builtin_aarch64_get_qregoiv4si (__o, 0); \
+- ret.val[1] = (vtype) __builtin_aarch64_get_qregoiv4si (__o, 1); \
+- return ret; \
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_lane_f32 (float32x4_t __a, float32x4_t __b,
++ float32x2_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__LD2_LANE_FUNC (float16x8x2_t, float16x8_t, float16_t, v8hf, hf, f16)
+-__LD2_LANE_FUNC (float32x4x2_t, float32x4_t, float32_t, v4sf, sf, f32)
+-__LD2_LANE_FUNC (float64x2x2_t, float64x2_t, float64_t, v2df, df, f64)
+-__LD2_LANE_FUNC (poly8x16x2_t, poly8x16_t, poly8_t, v16qi, qi, p8)
+-__LD2_LANE_FUNC (poly16x8x2_t, poly16x8_t, poly16_t, v8hi, hi, p16)
+-__LD2_LANE_FUNC (int8x16x2_t, int8x16_t, int8_t, v16qi, qi, s8)
+-__LD2_LANE_FUNC (int16x8x2_t, int16x8_t, int16_t, v8hi, hi, s16)
+-__LD2_LANE_FUNC (int32x4x2_t, int32x4_t, int32_t, v4si, si, s32)
+-__LD2_LANE_FUNC (int64x2x2_t, int64x2_t, int64_t, v2di, di, s64)
+-__LD2_LANE_FUNC (uint8x16x2_t, uint8x16_t, uint8_t, v16qi, qi, u8)
+-__LD2_LANE_FUNC (uint16x8x2_t, uint16x8_t, uint16_t, v8hi, hi, u16)
+-__LD2_LANE_FUNC (uint32x4x2_t, uint32x4_t, uint32_t, v4si, si, u32)
+-__LD2_LANE_FUNC (uint64x2x2_t, uint64x2_t, uint64_t, v2di, di, u64)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_lane_s16 (int16x8_t __a, int16x8_t __b,
++ int16x4_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++}
+
+-#undef __LD2_LANE_FUNC
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_lane_s32 (int32x4_t __a, int32x4_t __b,
++ int32x2_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++}
+
+-/* vld3_lane */
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_lane_u16 (uint16x8_t __a, uint16x8_t __b,
++ uint16x4_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++}
+
+-#define __LD3_LANE_FUNC(intype, vectype, largetype, ptrtype, mode, \
+- qmode, ptrmode, funcsuffix, signedtype) \
+-__extension__ static __inline intype __attribute__ ((__always_inline__)) \
+-vld3_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
+-{ \
+- __builtin_aarch64_simd_ci __o; \
+- largetype __temp; \
+- __temp.val[0] = \
+- vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \
+- __temp.val[1] = \
+- vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \
+- __temp.val[2] = \
+- vcombine_##funcsuffix (__b.val[2], vcreate_##funcsuffix (0)); \
+- __o = __builtin_aarch64_set_qregci##qmode (__o, \
+- (signedtype) __temp.val[0], \
+- 0); \
+- __o = __builtin_aarch64_set_qregci##qmode (__o, \
+- (signedtype) __temp.val[1], \
+- 1); \
+- __o = __builtin_aarch64_set_qregci##qmode (__o, \
+- (signedtype) __temp.val[2], \
+- 2); \
+- __o = __builtin_aarch64_ld3_lane##mode ( \
+- (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
+- __b.val[0] = (vectype) __builtin_aarch64_get_dregcidi (__o, 0); \
+- __b.val[1] = (vectype) __builtin_aarch64_get_dregcidi (__o, 1); \
+- __b.val[2] = (vectype) __builtin_aarch64_get_dregcidi (__o, 2); \
+- return __b; \
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_lane_u32 (uint32x4_t __a, uint32x4_t __b,
++ uint32x2_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__LD3_LANE_FUNC (float16x4x3_t, float16x4_t, float16x8x3_t, float16_t, v4hf,
+- v8hf, hf, f16, float16x8_t)
+-__LD3_LANE_FUNC (float32x2x3_t, float32x2_t, float32x4x3_t, float32_t, v2sf, v4sf,
+- sf, f32, float32x4_t)
+-__LD3_LANE_FUNC (float64x1x3_t, float64x1_t, float64x2x3_t, float64_t, df, v2df,
+- df, f64, float64x2_t)
+-__LD3_LANE_FUNC (poly8x8x3_t, poly8x8_t, poly8x16x3_t, poly8_t, v8qi, v16qi, qi, p8,
+- int8x16_t)
+-__LD3_LANE_FUNC (poly16x4x3_t, poly16x4_t, poly16x8x3_t, poly16_t, v4hi, v8hi, hi,
+- p16, int16x8_t)
+-__LD3_LANE_FUNC (int8x8x3_t, int8x8_t, int8x16x3_t, int8_t, v8qi, v16qi, qi, s8,
+- int8x16_t)
+-__LD3_LANE_FUNC (int16x4x3_t, int16x4_t, int16x8x3_t, int16_t, v4hi, v8hi, hi, s16,
+- int16x8_t)
+-__LD3_LANE_FUNC (int32x2x3_t, int32x2_t, int32x4x3_t, int32_t, v2si, v4si, si, s32,
+- int32x4_t)
+-__LD3_LANE_FUNC (int64x1x3_t, int64x1_t, int64x2x3_t, int64_t, di, v2di, di, s64,
+- int64x2_t)
+-__LD3_LANE_FUNC (uint8x8x3_t, uint8x8_t, uint8x16x3_t, uint8_t, v8qi, v16qi, qi, u8,
+- int8x16_t)
+-__LD3_LANE_FUNC (uint16x4x3_t, uint16x4_t, uint16x8x3_t, uint16_t, v4hi, v8hi, hi,
+- u16, int16x8_t)
+-__LD3_LANE_FUNC (uint32x2x3_t, uint32x2_t, uint32x4x3_t, uint32_t, v2si, v4si, si,
+- u32, int32x4_t)
+-__LD3_LANE_FUNC (uint64x1x3_t, uint64x1_t, uint64x2x3_t, uint64_t, di, v2di, di,
+- u64, int64x2_t)
++ /* vmlsq_laneq */
+
+-#undef __LD3_LANE_FUNC
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_laneq_f32 (float32x4_t __a, float32x4_t __b,
++ float32x4_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++}
+
+-/* vld3q_lane */
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_laneq_s16 (int16x8_t __a, int16x8_t __b,
++ int16x8_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++}
+
+-#define __LD3_LANE_FUNC(intype, vtype, ptrtype, mode, ptrmode, funcsuffix) \
+-__extension__ static __inline intype __attribute__ ((__always_inline__)) \
+-vld3q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
+-{ \
+- __builtin_aarch64_simd_ci __o; \
+- intype ret; \
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[0], 0); \
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[1], 1); \
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) __b.val[2], 2); \
+- __o = __builtin_aarch64_ld3_lane##mode ( \
+- (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
+- ret.val[0] = (vtype) __builtin_aarch64_get_qregciv4si (__o, 0); \
+- ret.val[1] = (vtype) __builtin_aarch64_get_qregciv4si (__o, 1); \
+- ret.val[2] = (vtype) __builtin_aarch64_get_qregciv4si (__o, 2); \
+- return ret; \
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_laneq_s32 (int32x4_t __a, int32x4_t __b,
++ int32x4_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++}
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_laneq_u16 (uint16x8_t __a, uint16x8_t __b,
++ uint16x8_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
+ }
+
+-__LD3_LANE_FUNC (float16x8x3_t, float16x8_t, float16_t, v8hf, hf, f16)
+-__LD3_LANE_FUNC (float32x4x3_t, float32x4_t, float32_t, v4sf, sf, f32)
+-__LD3_LANE_FUNC (float64x2x3_t, float64x2_t, float64_t, v2df, df, f64)
+-__LD3_LANE_FUNC (poly8x16x3_t, poly8x16_t, poly8_t, v16qi, qi, p8)
+-__LD3_LANE_FUNC (poly16x8x3_t, poly16x8_t, poly16_t, v8hi, hi, p16)
+-__LD3_LANE_FUNC (int8x16x3_t, int8x16_t, int8_t, v16qi, qi, s8)
+-__LD3_LANE_FUNC (int16x8x3_t, int16x8_t, int16_t, v8hi, hi, s16)
+-__LD3_LANE_FUNC (int32x4x3_t, int32x4_t, int32_t, v4si, si, s32)
+-__LD3_LANE_FUNC (int64x2x3_t, int64x2_t, int64_t, v2di, di, s64)
+-__LD3_LANE_FUNC (uint8x16x3_t, uint8x16_t, uint8_t, v16qi, qi, u8)
+-__LD3_LANE_FUNC (uint16x8x3_t, uint16x8_t, uint16_t, v8hi, hi, u16)
+-__LD3_LANE_FUNC (uint32x4x3_t, uint32x4_t, uint32_t, v4si, si, u32)
+-__LD3_LANE_FUNC (uint64x2x3_t, uint64x2_t, uint64_t, v2di, di, u64)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmlsq_laneq_u32 (uint32x4_t __a, uint32x4_t __b,
++ uint32x4_t __c, const int __lane)
++{
++ return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++}
+
+-#undef __LD3_LANE_FUNC
++/* vmov_n_ */
+
+-/* vld4_lane */
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_f16 (float16_t __a)
++{
++ return vdup_n_f16 (__a);
++}
+
+-#define __LD4_LANE_FUNC(intype, vectype, largetype, ptrtype, mode, \
+- qmode, ptrmode, funcsuffix, signedtype) \
+-__extension__ static __inline intype __attribute__ ((__always_inline__)) \
+-vld4_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
+-{ \
+- __builtin_aarch64_simd_xi __o; \
+- largetype __temp; \
+- __temp.val[0] = \
+- vcombine_##funcsuffix (__b.val[0], vcreate_##funcsuffix (0)); \
+- __temp.val[1] = \
+- vcombine_##funcsuffix (__b.val[1], vcreate_##funcsuffix (0)); \
+- __temp.val[2] = \
+- vcombine_##funcsuffix (__b.val[2], vcreate_##funcsuffix (0)); \
+- __temp.val[3] = \
+- vcombine_##funcsuffix (__b.val[3], vcreate_##funcsuffix (0)); \
+- __o = __builtin_aarch64_set_qregxi##qmode (__o, \
+- (signedtype) __temp.val[0], \
+- 0); \
+- __o = __builtin_aarch64_set_qregxi##qmode (__o, \
+- (signedtype) __temp.val[1], \
+- 1); \
+- __o = __builtin_aarch64_set_qregxi##qmode (__o, \
+- (signedtype) __temp.val[2], \
+- 2); \
+- __o = __builtin_aarch64_set_qregxi##qmode (__o, \
+- (signedtype) __temp.val[3], \
+- 3); \
+- __o = __builtin_aarch64_ld4_lane##mode ( \
+- (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
+- __b.val[0] = (vectype) __builtin_aarch64_get_dregxidi (__o, 0); \
+- __b.val[1] = (vectype) __builtin_aarch64_get_dregxidi (__o, 1); \
+- __b.val[2] = (vectype) __builtin_aarch64_get_dregxidi (__o, 2); \
+- __b.val[3] = (vectype) __builtin_aarch64_get_dregxidi (__o, 3); \
+- return __b; \
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_f32 (float32_t __a)
++{
++ return vdup_n_f32 (__a);
+ }
+
+-/* vld4q_lane */
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_f64 (float64_t __a)
++{
++ return (float64x1_t) {__a};
++}
+
+-__LD4_LANE_FUNC (float16x4x4_t, float16x4_t, float16x8x4_t, float16_t, v4hf,
+- v8hf, hf, f16, float16x8_t)
+-__LD4_LANE_FUNC (float32x2x4_t, float32x2_t, float32x4x4_t, float32_t, v2sf, v4sf,
+- sf, f32, float32x4_t)
+-__LD4_LANE_FUNC (float64x1x4_t, float64x1_t, float64x2x4_t, float64_t, df, v2df,
+- df, f64, float64x2_t)
+-__LD4_LANE_FUNC (poly8x8x4_t, poly8x8_t, poly8x16x4_t, poly8_t, v8qi, v16qi, qi, p8,
+- int8x16_t)
+-__LD4_LANE_FUNC (poly16x4x4_t, poly16x4_t, poly16x8x4_t, poly16_t, v4hi, v8hi, hi,
+- p16, int16x8_t)
+-__LD4_LANE_FUNC (int8x8x4_t, int8x8_t, int8x16x4_t, int8_t, v8qi, v16qi, qi, s8,
+- int8x16_t)
+-__LD4_LANE_FUNC (int16x4x4_t, int16x4_t, int16x8x4_t, int16_t, v4hi, v8hi, hi, s16,
+- int16x8_t)
+-__LD4_LANE_FUNC (int32x2x4_t, int32x2_t, int32x4x4_t, int32_t, v2si, v4si, si, s32,
+- int32x4_t)
+-__LD4_LANE_FUNC (int64x1x4_t, int64x1_t, int64x2x4_t, int64_t, di, v2di, di, s64,
+- int64x2_t)
+-__LD4_LANE_FUNC (uint8x8x4_t, uint8x8_t, uint8x16x4_t, uint8_t, v8qi, v16qi, qi, u8,
+- int8x16_t)
+-__LD4_LANE_FUNC (uint16x4x4_t, uint16x4_t, uint16x8x4_t, uint16_t, v4hi, v8hi, hi,
+- u16, int16x8_t)
+-__LD4_LANE_FUNC (uint32x2x4_t, uint32x2_t, uint32x4x4_t, uint32_t, v2si, v4si, si,
+- u32, int32x4_t)
+-__LD4_LANE_FUNC (uint64x1x4_t, uint64x1_t, uint64x2x4_t, uint64_t, di, v2di, di,
+- u64, int64x2_t)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_p8 (poly8_t __a)
++{
++ return vdup_n_p8 (__a);
++}
++
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_p16 (poly16_t __a)
++{
++ return vdup_n_p16 (__a);
++}
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_s8 (int8_t __a)
++{
++ return vdup_n_s8 (__a);
++}
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_s16 (int16_t __a)
++{
++ return vdup_n_s16 (__a);
++}
+
+-#undef __LD4_LANE_FUNC
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_s32 (int32_t __a)
++{
++ return vdup_n_s32 (__a);
++}
+
+-/* vld4q_lane */
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_s64 (int64_t __a)
++{
++ return (int64x1_t) {__a};
++}
+
+-#define __LD4_LANE_FUNC(intype, vtype, ptrtype, mode, ptrmode, funcsuffix) \
+-__extension__ static __inline intype __attribute__ ((__always_inline__)) \
+-vld4q_lane_##funcsuffix (const ptrtype * __ptr, intype __b, const int __c) \
+-{ \
+- __builtin_aarch64_simd_xi __o; \
+- intype ret; \
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[0], 0); \
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[1], 1); \
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[2], 2); \
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) __b.val[3], 3); \
+- __o = __builtin_aarch64_ld4_lane##mode ( \
+- (__builtin_aarch64_simd_##ptrmode *) __ptr, __o, __c); \
+- ret.val[0] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 0); \
+- ret.val[1] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 1); \
+- ret.val[2] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 2); \
+- ret.val[3] = (vtype) __builtin_aarch64_get_qregxiv4si (__o, 3); \
+- return ret; \
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_u8 (uint8_t __a)
++{
++ return vdup_n_u8 (__a);
+ }
+
+-__LD4_LANE_FUNC (float16x8x4_t, float16x8_t, float16_t, v8hf, hf, f16)
+-__LD4_LANE_FUNC (float32x4x4_t, float32x4_t, float32_t, v4sf, sf, f32)
+-__LD4_LANE_FUNC (float64x2x4_t, float64x2_t, float64_t, v2df, df, f64)
+-__LD4_LANE_FUNC (poly8x16x4_t, poly8x16_t, poly8_t, v16qi, qi, p8)
+-__LD4_LANE_FUNC (poly16x8x4_t, poly16x8_t, poly16_t, v8hi, hi, p16)
+-__LD4_LANE_FUNC (int8x16x4_t, int8x16_t, int8_t, v16qi, qi, s8)
+-__LD4_LANE_FUNC (int16x8x4_t, int16x8_t, int16_t, v8hi, hi, s16)
+-__LD4_LANE_FUNC (int32x4x4_t, int32x4_t, int32_t, v4si, si, s32)
+-__LD4_LANE_FUNC (int64x2x4_t, int64x2_t, int64_t, v2di, di, s64)
+-__LD4_LANE_FUNC (uint8x16x4_t, uint8x16_t, uint8_t, v16qi, qi, u8)
+-__LD4_LANE_FUNC (uint16x8x4_t, uint16x8_t, uint16_t, v8hi, hi, u16)
+-__LD4_LANE_FUNC (uint32x4x4_t, uint32x4_t, uint32_t, v4si, si, u32)
+-__LD4_LANE_FUNC (uint64x2x4_t, uint64x2_t, uint64_t, v2di, di, u64)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_u16 (uint16_t __a)
++{
++ return vdup_n_u16 (__a);
++}
+
+-#undef __LD4_LANE_FUNC
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_u32 (uint32_t __a)
++{
++ return vdup_n_u32 (__a);
++}
+
+-/* vmax */
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmov_n_u64 (uint64_t __a)
++{
++ return (uint64x1_t) {__a};
++}
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmax_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_f16 (float16_t __a)
+ {
+- return __builtin_aarch64_smax_nanv2sf (__a, __b);
++ return vdupq_n_f16 (__a);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vmax_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_f32 (float32_t __a)
+ {
+- return __builtin_aarch64_smaxv8qi (__a, __b);
++ return vdupq_n_f32 (__a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmax_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_f64 (float64_t __a)
+ {
+- return __builtin_aarch64_smaxv4hi (__a, __b);
++ return vdupq_n_f64 (__a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmax_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_p8 (poly8_t __a)
+ {
+- return __builtin_aarch64_smaxv2si (__a, __b);
++ return vdupq_n_p8 (__a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vmax_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_p16 (poly16_t __a)
+ {
+- return (uint8x8_t) __builtin_aarch64_umaxv8qi ((int8x8_t) __a,
+- (int8x8_t) __b);
++ return vdupq_n_p16 (__a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmax_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_s8 (int8_t __a)
+ {
+- return (uint16x4_t) __builtin_aarch64_umaxv4hi ((int16x4_t) __a,
+- (int16x4_t) __b);
++ return vdupq_n_s8 (__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmax_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_s16 (int16_t __a)
+ {
+- return (uint32x2_t) __builtin_aarch64_umaxv2si ((int32x2_t) __a,
+- (int32x2_t) __b);
++ return vdupq_n_s16 (__a);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmaxq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_s32 (int32_t __a)
+ {
+- return __builtin_aarch64_smax_nanv4sf (__a, __b);
++ return vdupq_n_s32 (__a);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmaxq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_s64 (int64_t __a)
+ {
+- return __builtin_aarch64_smax_nanv2df (__a, __b);
++ return vdupq_n_s64 (__a);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vmaxq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_u8 (uint8_t __a)
+ {
+- return __builtin_aarch64_smaxv16qi (__a, __b);
++ return vdupq_n_u8 (__a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmaxq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_u16 (uint16_t __a)
+ {
+- return __builtin_aarch64_smaxv8hi (__a, __b);
++ return vdupq_n_u16 (__a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmaxq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_u32 (uint32_t __a)
+ {
+- return __builtin_aarch64_smaxv4si (__a, __b);
++ return vdupq_n_u32 (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vmaxq_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmovq_n_u64 (uint64_t __a)
+ {
+- return (uint8x16_t) __builtin_aarch64_umaxv16qi ((int8x16_t) __a,
+- (int8x16_t) __b);
++ return vdupq_n_u64 (__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmaxq_u16 (uint16x8_t __a, uint16x8_t __b)
++/* vmul_lane */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_lane_f32 (float32x2_t __a, float32x2_t __b, const int __lane)
+ {
+- return (uint16x8_t) __builtin_aarch64_umaxv8hi ((int16x8_t) __a,
+- (int16x8_t) __b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmaxq_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_lane_f64 (float64x1_t __a, float64x1_t __b, const int __lane)
+ {
+- return (uint32x4_t) __builtin_aarch64_umaxv4si ((int32x4_t) __a,
+- (int32x4_t) __b);
++ return __a * __b;
+ }
+-/* vmulx */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmulx_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_lane_s16 (int16x4_t __a, int16x4_t __b, const int __lane)
+ {
+- return __builtin_aarch64_fmulxv2sf (__a, __b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmulxq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_lane_s32 (int32x2_t __a, int32x2_t __b, const int __lane)
+ {
+- return __builtin_aarch64_fmulxv4sf (__a, __b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vmulx_f64 (float64x1_t __a, float64x1_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_lane_u16 (uint16x4_t __a, uint16x4_t __b, const int __lane)
+ {
+- return (float64x1_t) {__builtin_aarch64_fmulxdf (__a[0], __b[0])};
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmulxq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_lane_u32 (uint32x2_t __a, uint32x2_t __b, const int __lane)
+ {
+- return __builtin_aarch64_fmulxv2df (__a, __b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vmulxs_f32 (float32_t __a, float32_t __b)
++/* vmuld_lane */
++
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmuld_lane_f64 (float64_t __a, float64x1_t __b, const int __lane)
+ {
+- return __builtin_aarch64_fmulxsf (__a, __b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vmulxd_f64 (float64_t __a, float64_t __b)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmuld_laneq_f64 (float64_t __a, float64x2_t __b, const int __lane)
+ {
+- return __builtin_aarch64_fmulxdf (__a, __b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmulx_lane_f32 (float32x2_t __a, float32x2_t __v, const int __lane)
++/* vmuls_lane */
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmuls_lane_f32 (float32_t __a, float32x2_t __b, const int __lane)
+ {
+- return vmulx_f32 (__a, __aarch64_vdup_lane_f32 (__v, __lane));
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vmulx_lane_f64 (float64x1_t __a, float64x1_t __v, const int __lane)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmuls_laneq_f32 (float32_t __a, float32x4_t __b, const int __lane)
+ {
+- return vmulx_f64 (__a, __aarch64_vdup_lane_f64 (__v, __lane));
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmulxq_lane_f32 (float32x4_t __a, float32x2_t __v, const int __lane)
++/* vmul_laneq */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_laneq_f32 (float32x2_t __a, float32x4_t __b, const int __lane)
+ {
+- return vmulxq_f32 (__a, __aarch64_vdupq_lane_f32 (__v, __lane));
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmulxq_lane_f64 (float64x2_t __a, float64x1_t __v, const int __lane)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_laneq_f64 (float64x1_t __a, float64x2_t __b, const int __lane)
+ {
+- return vmulxq_f64 (__a, __aarch64_vdupq_lane_f64 (__v, __lane));
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmulx_laneq_f32 (float32x2_t __a, float32x4_t __v, const int __lane)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_laneq_s16 (int16x4_t __a, int16x8_t __b, const int __lane)
+ {
+- return vmulx_f32 (__a, __aarch64_vdup_laneq_f32 (__v, __lane));
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vmulx_laneq_f64 (float64x1_t __a, float64x2_t __v, const int __lane)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_laneq_s32 (int32x2_t __a, int32x4_t __b, const int __lane)
+ {
+- return vmulx_f64 (__a, __aarch64_vdup_laneq_f64 (__v, __lane));
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmulxq_laneq_f32 (float32x4_t __a, float32x4_t __v, const int __lane)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_laneq_u16 (uint16x4_t __a, uint16x8_t __b, const int __lane)
+ {
+- return vmulxq_f32 (__a, __aarch64_vdupq_laneq_f32 (__v, __lane));
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmulxq_laneq_f64 (float64x2_t __a, float64x2_t __v, const int __lane)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_laneq_u32 (uint32x2_t __a, uint32x4_t __b, const int __lane)
+ {
+- return vmulxq_f64 (__a, __aarch64_vdupq_laneq_f64 (__v, __lane));
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vmulxs_lane_f32 (float32_t __a, float32x2_t __v, const int __lane)
++/* vmul_n */
++
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_n_f64 (float64x1_t __a, float64_t __b)
+ {
+- return vmulxs_f32 (__a, __aarch64_vget_lane_any (__v, __lane));
++ return (float64x1_t) { vget_lane_f64 (__a, 0) * __b };
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vmulxs_laneq_f32 (float32_t __a, float32x4_t __v, const int __lane)
++/* vmulq_lane */
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_lane_f32 (float32x4_t __a, float32x2_t __b, const int __lane)
+ {
+- return vmulxs_f32 (__a, __aarch64_vget_lane_any (__v, __lane));
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vmulxd_lane_f64 (float64_t __a, float64x1_t __v, const int __lane)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_lane_f64 (float64x2_t __a, float64x1_t __b, const int __lane)
+ {
+- return vmulxd_f64 (__a, __aarch64_vget_lane_any (__v, __lane));
++ __AARCH64_LANE_CHECK (__a, __lane);
++ return __a * __b[0];
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vmulxd_laneq_f64 (float64_t __a, float64x2_t __v, const int __lane)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __lane)
+ {
+- return vmulxd_f64 (__a, __aarch64_vget_lane_any (__v, __lane));
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-/* vpmax */
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __lane)
++{
++ return __a * __aarch64_vget_lane_any (__b, __lane);
++}
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vpmax_s8 (int8x8_t a, int8x8_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_lane_u16 (uint16x8_t __a, uint16x4_t __b, const int __lane)
+ {
+- return __builtin_aarch64_smaxpv8qi (a, b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vpmax_s16 (int16x4_t a, int16x4_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_lane_u32 (uint32x4_t __a, uint32x2_t __b, const int __lane)
+ {
+- return __builtin_aarch64_smaxpv4hi (a, b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vpmax_s32 (int32x2_t a, int32x2_t b)
++/* vmulq_laneq */
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_laneq_f32 (float32x4_t __a, float32x4_t __b, const int __lane)
+ {
+- return __builtin_aarch64_smaxpv2si (a, b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vpmax_u8 (uint8x8_t a, uint8x8_t b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_laneq_f64 (float64x2_t __a, float64x2_t __b, const int __lane)
+ {
+- return (uint8x8_t) __builtin_aarch64_umaxpv8qi ((int8x8_t) a,
+- (int8x8_t) b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vpmax_u16 (uint16x4_t a, uint16x4_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_laneq_s16 (int16x8_t __a, int16x8_t __b, const int __lane)
+ {
+- return (uint16x4_t) __builtin_aarch64_umaxpv4hi ((int16x4_t) a,
+- (int16x4_t) b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vpmax_u32 (uint32x2_t a, uint32x2_t b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __lane)
+ {
+- return (uint32x2_t) __builtin_aarch64_umaxpv2si ((int32x2_t) a,
+- (int32x2_t) b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vpmaxq_s8 (int8x16_t a, int8x16_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_laneq_u16 (uint16x8_t __a, uint16x8_t __b, const int __lane)
+ {
+- return __builtin_aarch64_smaxpv16qi (a, b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vpmaxq_s16 (int16x8_t a, int16x8_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_laneq_u32 (uint32x4_t __a, uint32x4_t __b, const int __lane)
+ {
+- return __builtin_aarch64_smaxpv8hi (a, b);
++ return __a * __aarch64_vget_lane_any (__b, __lane);
++}
++
++/* vmul_n. */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_n_f32 (float32x2_t __a, float32_t __b)
++{
++ return __a * __b;
++}
++
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_n_f32 (float32x4_t __a, float32_t __b)
++{
++ return __a * __b;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vpmaxq_s32 (int32x4_t a, int32x4_t b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_n_f64 (float64x2_t __a, float64_t __b)
+ {
+- return __builtin_aarch64_smaxpv4si (a, b);
++ return __a * __b;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vpmaxq_u8 (uint8x16_t a, uint8x16_t b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_n_s16 (int16x4_t __a, int16_t __b)
+ {
+- return (uint8x16_t) __builtin_aarch64_umaxpv16qi ((int8x16_t) a,
+- (int8x16_t) b);
++ return __a * __b;
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vpmaxq_u16 (uint16x8_t a, uint16x8_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_n_s16 (int16x8_t __a, int16_t __b)
+ {
+- return (uint16x8_t) __builtin_aarch64_umaxpv8hi ((int16x8_t) a,
+- (int16x8_t) b);
++ return __a * __b;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vpmaxq_u32 (uint32x4_t a, uint32x4_t b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_n_s32 (int32x2_t __a, int32_t __b)
+ {
+- return (uint32x4_t) __builtin_aarch64_umaxpv4si ((int32x4_t) a,
+- (int32x4_t) b);
++ return __a * __b;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vpmax_f32 (float32x2_t a, float32x2_t b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_n_s32 (int32x4_t __a, int32_t __b)
+ {
+- return __builtin_aarch64_smax_nanpv2sf (a, b);
++ return __a * __b;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vpmaxq_f32 (float32x4_t a, float32x4_t b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_n_u16 (uint16x4_t __a, uint16_t __b)
+ {
+- return __builtin_aarch64_smax_nanpv4sf (a, b);
++ return __a * __b;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vpmaxq_f64 (float64x2_t a, float64x2_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_n_u16 (uint16x8_t __a, uint16_t __b)
+ {
+- return __builtin_aarch64_smax_nanpv2df (a, b);
++ return __a * __b;
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vpmaxqd_f64 (float64x2_t a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_n_u32 (uint32x2_t __a, uint32_t __b)
+ {
+- return __builtin_aarch64_reduc_smax_nan_scal_v2df (a);
++ return __a * __b;
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vpmaxs_f32 (float32x2_t a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_n_u32 (uint32x4_t __a, uint32_t __b)
+ {
+- return __builtin_aarch64_reduc_smax_nan_scal_v2sf (a);
++ return __a * __b;
+ }
+
+-/* vpmaxnm */
++/* vmvn */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vpmaxnm_f32 (float32x2_t a, float32x2_t b)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvn_p8 (poly8x8_t __a)
+ {
+- return __builtin_aarch64_smaxpv2sf (a, b);
++ return (poly8x8_t) ~((int8x8_t) __a);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vpmaxnmq_f32 (float32x4_t a, float32x4_t b)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvn_s8 (int8x8_t __a)
+ {
+- return __builtin_aarch64_smaxpv4sf (a, b);
++ return ~__a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vpmaxnmq_f64 (float64x2_t a, float64x2_t b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvn_s16 (int16x4_t __a)
+ {
+- return __builtin_aarch64_smaxpv2df (a, b);
++ return ~__a;
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vpmaxnmqd_f64 (float64x2_t a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvn_s32 (int32x2_t __a)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v2df (a);
++ return ~__a;
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vpmaxnms_f32 (float32x2_t a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvn_u8 (uint8x8_t __a)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v2sf (a);
++ return ~__a;
+ }
+
+-/* vpmin */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vpmin_s8 (int8x8_t a, int8x8_t b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvn_u16 (uint16x4_t __a)
+ {
+- return __builtin_aarch64_sminpv8qi (a, b);
++ return ~__a;
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vpmin_s16 (int16x4_t a, int16x4_t b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvn_u32 (uint32x2_t __a)
+ {
+- return __builtin_aarch64_sminpv4hi (a, b);
++ return ~__a;
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vpmin_s32 (int32x2_t a, int32x2_t b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvnq_p8 (poly8x16_t __a)
+ {
+- return __builtin_aarch64_sminpv2si (a, b);
++ return (poly8x16_t) ~((int8x16_t) __a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vpmin_u8 (uint8x8_t a, uint8x8_t b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvnq_s8 (int8x16_t __a)
+ {
+- return (uint8x8_t) __builtin_aarch64_uminpv8qi ((int8x8_t) a,
+- (int8x8_t) b);
++ return ~__a;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vpmin_u16 (uint16x4_t a, uint16x4_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvnq_s16 (int16x8_t __a)
+ {
+- return (uint16x4_t) __builtin_aarch64_uminpv4hi ((int16x4_t) a,
+- (int16x4_t) b);
++ return ~__a;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vpmin_u32 (uint32x2_t a, uint32x2_t b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvnq_s32 (int32x4_t __a)
+ {
+- return (uint32x2_t) __builtin_aarch64_uminpv2si ((int32x2_t) a,
+- (int32x2_t) b);
++ return ~__a;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vpminq_s8 (int8x16_t a, int8x16_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvnq_u8 (uint8x16_t __a)
+ {
+- return __builtin_aarch64_sminpv16qi (a, b);
++ return ~__a;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vpminq_s16 (int16x8_t a, int16x8_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvnq_u16 (uint16x8_t __a)
+ {
+- return __builtin_aarch64_sminpv8hi (a, b);
++ return ~__a;
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vpminq_s32 (int32x4_t a, int32x4_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmvnq_u32 (uint32x4_t __a)
+ {
+- return __builtin_aarch64_sminpv4si (a, b);
++ return ~__a;
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vpminq_u8 (uint8x16_t a, uint8x16_t b)
+-{
+- return (uint8x16_t) __builtin_aarch64_uminpv16qi ((int8x16_t) a,
+- (int8x16_t) b);
+-}
++/* vneg */
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vpminq_u16 (uint16x8_t a, uint16x8_t b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vneg_f32 (float32x2_t __a)
+ {
+- return (uint16x8_t) __builtin_aarch64_uminpv8hi ((int16x8_t) a,
+- (int16x8_t) b);
++ return -__a;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vpminq_u32 (uint32x4_t a, uint32x4_t b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vneg_f64 (float64x1_t __a)
+ {
+- return (uint32x4_t) __builtin_aarch64_uminpv4si ((int32x4_t) a,
+- (int32x4_t) b);
++ return -__a;
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vpmin_f32 (float32x2_t a, float32x2_t b)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vneg_s8 (int8x8_t __a)
+ {
+- return __builtin_aarch64_smin_nanpv2sf (a, b);
++ return -__a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vpminq_f32 (float32x4_t a, float32x4_t b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vneg_s16 (int16x4_t __a)
+ {
+- return __builtin_aarch64_smin_nanpv4sf (a, b);
++ return -__a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vpminq_f64 (float64x2_t a, float64x2_t b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vneg_s32 (int32x2_t __a)
+ {
+- return __builtin_aarch64_smin_nanpv2df (a, b);
++ return -__a;
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vpminqd_f64 (float64x2_t a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vneg_s64 (int64x1_t __a)
+ {
+- return __builtin_aarch64_reduc_smin_nan_scal_v2df (a);
++ return -__a;
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vpmins_f32 (float32x2_t a)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vnegq_f32 (float32x4_t __a)
+ {
+- return __builtin_aarch64_reduc_smin_nan_scal_v2sf (a);
++ return -__a;
+ }
+
+-/* vpminnm */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vpminnm_f32 (float32x2_t a, float32x2_t b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vnegq_f64 (float64x2_t __a)
+ {
+- return __builtin_aarch64_sminpv2sf (a, b);
++ return -__a;
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vpminnmq_f32 (float32x4_t a, float32x4_t b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vnegq_s8 (int8x16_t __a)
+ {
+- return __builtin_aarch64_sminpv4sf (a, b);
++ return -__a;
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vpminnmq_f64 (float64x2_t a, float64x2_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vnegq_s16 (int16x8_t __a)
+ {
+- return __builtin_aarch64_sminpv2df (a, b);
++ return -__a;
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vpminnmqd_f64 (float64x2_t a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vnegq_s32 (int32x4_t __a)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v2df (a);
++ return -__a;
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vpminnms_f32 (float32x2_t a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vnegq_s64 (int64x2_t __a)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v2sf (a);
++ return -__a;
+ }
+
+-/* vmaxnm */
++/* vpadd */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmaxnm_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadd_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- return __builtin_aarch64_fmaxv2sf (__a, __b);
++ return __builtin_aarch64_faddpv2sf (__a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmaxnmq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- return __builtin_aarch64_fmaxv4sf (__a, __b);
++ return __builtin_aarch64_faddpv4sf (__a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmaxnmq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- return __builtin_aarch64_fmaxv2df (__a, __b);
++ return __builtin_aarch64_faddpv2df (__a, __b);
+ }
+
+-/* vmaxv */
+-
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vmaxv_f32 (float32x2_t __a)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadd_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- return __builtin_aarch64_reduc_smax_nan_scal_v2sf (__a);
++ return __builtin_aarch64_addpv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vmaxv_s8 (int8x8_t __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadd_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v8qi (__a);
++ return __builtin_aarch64_addpv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vmaxv_s16 (int16x4_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadd_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v4hi (__a);
++ return __builtin_aarch64_addpv2si (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vmaxv_s32 (int32x2_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadd_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v2si (__a);
++ return (uint8x8_t) __builtin_aarch64_addpv8qi ((int8x8_t) __a,
++ (int8x8_t) __b);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vmaxv_u8 (uint8x8_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadd_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+- return __builtin_aarch64_reduc_umax_scal_v8qi_uu (__a);
++ return (uint16x4_t) __builtin_aarch64_addpv4hi ((int16x4_t) __a,
++ (int16x4_t) __b);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vmaxv_u16 (uint16x4_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadd_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- return __builtin_aarch64_reduc_umax_scal_v4hi_uu (__a);
++ return (uint32x2_t) __builtin_aarch64_addpv2si ((int32x2_t) __a,
++ (int32x2_t) __b);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vmaxv_u32 (uint32x2_t __a)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadds_f32 (float32x2_t __a)
+ {
+- return __builtin_aarch64_reduc_umax_scal_v2si_uu (__a);
++ return __builtin_aarch64_reduc_plus_scal_v2sf (__a);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vmaxvq_f32 (float32x4_t __a)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddd_f64 (float64x2_t __a)
+ {
+- return __builtin_aarch64_reduc_smax_nan_scal_v4sf (__a);
++ return __builtin_aarch64_reduc_plus_scal_v2df (__a);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vmaxvq_f64 (float64x2_t __a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddd_s64 (int64x2_t __a)
+ {
+- return __builtin_aarch64_reduc_smax_nan_scal_v2df (__a);
++ return __builtin_aarch64_addpdi (__a);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vmaxvq_s8 (int8x16_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddd_u64 (uint64x2_t __a)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v16qi (__a);
++ return __builtin_aarch64_addpdi ((int64x2_t) __a);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vmaxvq_s16 (int16x8_t __a)
++/* vqabs */
++
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqabsq_s64 (int64x2_t __a)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v8hi (__a);
++ return (int64x2_t) __builtin_aarch64_sqabsv2di (__a);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vmaxvq_s32 (int32x4_t __a)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqabsb_s8 (int8_t __a)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v4si (__a);
++ return (int8_t) __builtin_aarch64_sqabsqi (__a);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vmaxvq_u8 (uint8x16_t __a)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqabsh_s16 (int16_t __a)
+ {
+- return __builtin_aarch64_reduc_umax_scal_v16qi_uu (__a);
++ return (int16_t) __builtin_aarch64_sqabshi (__a);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vmaxvq_u16 (uint16x8_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqabss_s32 (int32_t __a)
+ {
+- return __builtin_aarch64_reduc_umax_scal_v8hi_uu (__a);
++ return (int32_t) __builtin_aarch64_sqabssi (__a);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vmaxvq_u32 (uint32x4_t __a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqabsd_s64 (int64_t __a)
+ {
+- return __builtin_aarch64_reduc_umax_scal_v4si_uu (__a);
++ return __builtin_aarch64_sqabsdi (__a);
+ }
+
+-/* vmaxnmv */
++/* vqadd */
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vmaxnmv_f32 (float32x2_t __a)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqaddb_s8 (int8_t __a, int8_t __b)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v2sf (__a);
++ return (int8_t) __builtin_aarch64_sqaddqi (__a, __b);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vmaxnmvq_f32 (float32x4_t __a)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqaddh_s16 (int16_t __a, int16_t __b)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v4sf (__a);
++ return (int16_t) __builtin_aarch64_sqaddhi (__a, __b);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vmaxnmvq_f64 (float64x2_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqadds_s32 (int32_t __a, int32_t __b)
+ {
+- return __builtin_aarch64_reduc_smax_scal_v2df (__a);
++ return (int32_t) __builtin_aarch64_sqaddsi (__a, __b);
+ }
+
+-/* vmin */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmin_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqaddd_s64 (int64_t __a, int64_t __b)
+ {
+- return __builtin_aarch64_smin_nanv2sf (__a, __b);
++ return __builtin_aarch64_sqadddi (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vmin_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqaddb_u8 (uint8_t __a, uint8_t __b)
+ {
+- return __builtin_aarch64_sminv8qi (__a, __b);
++ return (uint8_t) __builtin_aarch64_uqaddqi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmin_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqaddh_u16 (uint16_t __a, uint16_t __b)
+ {
+- return __builtin_aarch64_sminv4hi (__a, __b);
++ return (uint16_t) __builtin_aarch64_uqaddhi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmin_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqadds_u32 (uint32_t __a, uint32_t __b)
+ {
+- return __builtin_aarch64_sminv2si (__a, __b);
++ return (uint32_t) __builtin_aarch64_uqaddsi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vmin_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqaddd_u64 (uint64_t __a, uint64_t __b)
+ {
+- return (uint8x8_t) __builtin_aarch64_uminv8qi ((int8x8_t) __a,
+- (int8x8_t) __b);
++ return __builtin_aarch64_uqadddi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmin_u16 (uint16x4_t __a, uint16x4_t __b)
+-{
+- return (uint16x4_t) __builtin_aarch64_uminv4hi ((int16x4_t) __a,
+- (int16x4_t) __b);
+-}
++/* vqdmlal */
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmin_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
+ {
+- return (uint32x2_t) __builtin_aarch64_uminv2si ((int32x2_t) __a,
+- (int32x2_t) __b);
++ return __builtin_aarch64_sqdmlalv4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vminq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_high_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c)
+ {
+- return __builtin_aarch64_smin_nanv4sf (__a, __b);
++ return __builtin_aarch64_sqdmlal2v8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vminq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_high_lane_s16 (int32x4_t __a, int16x8_t __b, int16x4_t __c,
++ int const __d)
+ {
+- return __builtin_aarch64_smin_nanv2df (__a, __b);
++ return __builtin_aarch64_sqdmlal2_lanev8hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vminq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_high_laneq_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c,
++ int const __d)
+ {
+- return __builtin_aarch64_sminv16qi (__a, __b);
++ return __builtin_aarch64_sqdmlal2_laneqv8hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vminq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_high_n_s16 (int32x4_t __a, int16x8_t __b, int16_t __c)
+ {
+- return __builtin_aarch64_sminv8hi (__a, __b);
++ return __builtin_aarch64_sqdmlal2_nv8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vminq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, int const __d)
+ {
+- return __builtin_aarch64_sminv4si (__a, __b);
++ return __builtin_aarch64_sqdmlal_lanev4hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vminq_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_laneq_s16 (int32x4_t __a, int16x4_t __b, int16x8_t __c, int const __d)
+ {
+- return (uint8x16_t) __builtin_aarch64_uminv16qi ((int8x16_t) __a,
+- (int8x16_t) __b);
++ return __builtin_aarch64_sqdmlal_laneqv4hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vminq_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
+ {
+- return (uint16x8_t) __builtin_aarch64_uminv8hi ((int16x8_t) __a,
+- (int16x8_t) __b);
++ return __builtin_aarch64_sqdmlal_nv4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vminq_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
+ {
+- return (uint32x4_t) __builtin_aarch64_uminv4si ((int32x4_t) __a,
+- (int32x4_t) __b);
++ return __builtin_aarch64_sqdmlalv2si (__a, __b, __c);
+ }
+
+-/* vminnm */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vminnm_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_high_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
+ {
+- return __builtin_aarch64_fminv2sf (__a, __b);
++ return __builtin_aarch64_sqdmlal2v4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vminnmq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x2_t __c,
++ int const __d)
+ {
+- return __builtin_aarch64_fminv4sf (__a, __b);
++ return __builtin_aarch64_sqdmlal2_lanev4si (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vminnmq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_high_laneq_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c,
++ int const __d)
+ {
+- return __builtin_aarch64_fminv2df (__a, __b);
++ return __builtin_aarch64_sqdmlal2_laneqv4si (__a, __b, __c, __d);
+ }
+
+-/* vminv */
+-
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vminv_f32 (float32x2_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_high_n_s32 (int64x2_t __a, int32x4_t __b, int32_t __c)
+ {
+- return __builtin_aarch64_reduc_smin_nan_scal_v2sf (__a);
++ return __builtin_aarch64_sqdmlal2_nv4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vminv_s8 (int8x8_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, int const __d)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v8qi (__a);
++ return __builtin_aarch64_sqdmlal_lanev2si (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vminv_s16 (int16x4_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_laneq_s32 (int64x2_t __a, int32x2_t __b, int32x4_t __c, int const __d)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v4hi (__a);
++ return __builtin_aarch64_sqdmlal_laneqv2si (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vminv_s32 (int32x2_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlal_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v2si (__a);
++ return __builtin_aarch64_sqdmlal_nv2si (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vminv_u8 (uint8x8_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlalh_s16 (int32_t __a, int16_t __b, int16_t __c)
+ {
+- return __builtin_aarch64_reduc_umin_scal_v8qi_uu (__a);
++ return __builtin_aarch64_sqdmlalhi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vminv_u16 (uint16x4_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlalh_lane_s16 (int32_t __a, int16_t __b, int16x4_t __c, const int __d)
+ {
+- return __builtin_aarch64_reduc_umin_scal_v4hi_uu (__a);
++ return __builtin_aarch64_sqdmlal_lanehi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vminv_u32 (uint32x2_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlalh_laneq_s16 (int32_t __a, int16_t __b, int16x8_t __c, const int __d)
+ {
+- return __builtin_aarch64_reduc_umin_scal_v2si_uu (__a);
++ return __builtin_aarch64_sqdmlal_laneqhi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vminvq_f32 (float32x4_t __a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlals_s32 (int64_t __a, int32_t __b, int32_t __c)
+ {
+- return __builtin_aarch64_reduc_smin_nan_scal_v4sf (__a);
++ return __builtin_aarch64_sqdmlalsi (__a, __b, __c);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vminvq_f64 (float64x2_t __a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlals_lane_s32 (int64_t __a, int32_t __b, int32x2_t __c, const int __d)
+ {
+- return __builtin_aarch64_reduc_smin_nan_scal_v2df (__a);
++ return __builtin_aarch64_sqdmlal_lanesi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vminvq_s8 (int8x16_t __a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlals_laneq_s32 (int64_t __a, int32_t __b, int32x4_t __c, const int __d)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v16qi (__a);
++ return __builtin_aarch64_sqdmlal_laneqsi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vminvq_s16 (int16x8_t __a)
++/* vqdmlsl */
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v8hi (__a);
++ return __builtin_aarch64_sqdmlslv4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vminvq_s32 (int32x4_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_high_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v4si (__a);
++ return __builtin_aarch64_sqdmlsl2v8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vminvq_u8 (uint8x16_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_high_lane_s16 (int32x4_t __a, int16x8_t __b, int16x4_t __c,
++ int const __d)
+ {
+- return __builtin_aarch64_reduc_umin_scal_v16qi_uu (__a);
++ return __builtin_aarch64_sqdmlsl2_lanev8hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vminvq_u16 (uint16x8_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_high_laneq_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c,
++ int const __d)
+ {
+- return __builtin_aarch64_reduc_umin_scal_v8hi_uu (__a);
++ return __builtin_aarch64_sqdmlsl2_laneqv8hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vminvq_u32 (uint32x4_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_high_n_s16 (int32x4_t __a, int16x8_t __b, int16_t __c)
+ {
+- return __builtin_aarch64_reduc_umin_scal_v4si_uu (__a);
++ return __builtin_aarch64_sqdmlsl2_nv8hi (__a, __b, __c);
+ }
+
+-/* vminnmv */
+-
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vminnmv_f32 (float32x2_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, int const __d)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v2sf (__a);
++ return __builtin_aarch64_sqdmlsl_lanev4hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vminnmvq_f32 (float32x4_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_laneq_s16 (int32x4_t __a, int16x4_t __b, int16x8_t __c, int const __d)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v4sf (__a);
++ return __builtin_aarch64_sqdmlsl_laneqv4hi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vminnmvq_f64 (float64x2_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
+ {
+- return __builtin_aarch64_reduc_smin_scal_v2df (__a);
++ return __builtin_aarch64_sqdmlsl_nv4hi (__a, __b, __c);
+ }
+
+-/* vmla */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmla_f32 (float32x2_t a, float32x2_t b, float32x2_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
+ {
+- return a + b * c;
++ return __builtin_aarch64_sqdmlslv2si (__a, __b, __c);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vmla_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_high_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
+ {
+- return __a + __b * __c;
++ return __builtin_aarch64_sqdmlsl2v4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmlaq_f32 (float32x4_t a, float32x4_t b, float32x4_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x2_t __c,
++ int const __d)
+ {
+- return a + b * c;
++ return __builtin_aarch64_sqdmlsl2_lanev4si (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmlaq_f64 (float64x2_t a, float64x2_t b, float64x2_t c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_high_laneq_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c,
++ int const __d)
+ {
+- return a + b * c;
++ return __builtin_aarch64_sqdmlsl2_laneqv4si (__a, __b, __c, __d);
+ }
+
+-/* vmla_lane */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmla_lane_f32 (float32x2_t __a, float32x2_t __b,
+- float32x2_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_high_n_s32 (int64x2_t __a, int32x4_t __b, int32_t __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmlsl2_nv4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmla_lane_s16 (int16x4_t __a, int16x4_t __b,
+- int16x4_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, int const __d)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmlsl_lanev2si (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmla_lane_s32 (int32x2_t __a, int32x2_t __b,
+- int32x2_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_laneq_s32 (int64x2_t __a, int32x2_t __b, int32x4_t __c, int const __d)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmlsl_laneqv2si (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmla_lane_u16 (uint16x4_t __a, uint16x4_t __b,
+- uint16x4_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsl_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmlsl_nv2si (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmla_lane_u32 (uint32x2_t __a, uint32x2_t __b,
+- uint32x2_t __c, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlslh_s16 (int32_t __a, int16_t __b, int16_t __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmlslhi (__a, __b, __c);
+ }
+
+-/* vmla_laneq */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmla_laneq_f32 (float32x2_t __a, float32x2_t __b,
+- float32x4_t __c, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlslh_lane_s16 (int32_t __a, int16_t __b, int16x4_t __c, const int __d)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmlsl_lanehi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmla_laneq_s16 (int16x4_t __a, int16x4_t __b,
+- int16x8_t __c, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlslh_laneq_s16 (int32_t __a, int16_t __b, int16x8_t __c, const int __d)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmlsl_laneqhi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmla_laneq_s32 (int32x2_t __a, int32x2_t __b,
+- int32x4_t __c, const int __lane)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsls_s32 (int64_t __a, int32_t __b, int32_t __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmlslsi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmla_laneq_u16 (uint16x4_t __a, uint16x4_t __b,
+- uint16x8_t __c, const int __lane)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsls_lane_s32 (int64_t __a, int32_t __b, int32x2_t __c, const int __d)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmlsl_lanesi (__a, __b, __c, __d);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmla_laneq_u32 (uint32x2_t __a, uint32x2_t __b,
+- uint32x4_t __c, const int __lane)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmlsls_laneq_s32 (int64_t __a, int32_t __b, int32x4_t __c, const int __d)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmlsl_laneqsi (__a, __b, __c, __d);
+ }
+
+-/* vmlaq_lane */
++/* vqdmulh */
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmlaq_lane_f32 (float32x4_t __a, float32x4_t __b,
+- float32x2_t __c, const int __lane)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulh_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmulh_lanev4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlaq_lane_s16 (int16x8_t __a, int16x8_t __b,
+- int16x4_t __c, const int __lane)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmulh_lanev2si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlaq_lane_s32 (int32x4_t __a, int32x4_t __b,
+- int32x2_t __c, const int __lane)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmulh_lanev8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlaq_lane_u16 (uint16x8_t __a, uint16x8_t __b,
+- uint16x4_t __c, const int __lane)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmulh_lanev4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlaq_lane_u32 (uint32x4_t __a, uint32x4_t __b,
+- uint32x2_t __c, const int __lane)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhh_s16 (int16_t __a, int16_t __b)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return (int16_t) __builtin_aarch64_sqdmulhhi (__a, __b);
+ }
+
+- /* vmlaq_laneq */
+-
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmlaq_laneq_f32 (float32x4_t __a, float32x4_t __b,
+- float32x4_t __c, const int __lane)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhh_lane_s16 (int16_t __a, int16x4_t __b, const int __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmulh_lanehi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlaq_laneq_s16 (int16x8_t __a, int16x8_t __b,
+- int16x8_t __c, const int __lane)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhh_laneq_s16 (int16_t __a, int16x8_t __b, const int __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmulh_laneqhi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlaq_laneq_s32 (int32x4_t __a, int32x4_t __b,
+- int32x4_t __c, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhs_s32 (int32_t __a, int32_t __b)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return (int32_t) __builtin_aarch64_sqdmulhsi (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlaq_laneq_u16 (uint16x8_t __a, uint16x8_t __b,
+- uint16x8_t __c, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhs_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmulh_lanesi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlaq_laneq_u32 (uint32x4_t __a, uint32x4_t __b,
+- uint32x4_t __c, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulhs_laneq_s32 (int32_t __a, int32x4_t __b, const int __c)
+ {
+- return (__a + (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmulh_laneqsi (__a, __b, __c);
+ }
+
+-/* vmls */
++/* vqdmull */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmls_f32 (float32x2_t a, float32x2_t b, float32x2_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- return a - b * c;
++ return __builtin_aarch64_sqdmullv4hi (__a, __b);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vmls_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_high_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- return __a - __b * __c;
++ return __builtin_aarch64_sqdmull2v8hi (__a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmlsq_f32 (float32x4_t a, float32x4_t b, float32x4_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_high_lane_s16 (int16x8_t __a, int16x4_t __b, int const __c)
+ {
+- return a - b * c;
++ return __builtin_aarch64_sqdmull2_lanev8hi (__a, __b,__c);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmlsq_f64 (float64x2_t a, float64x2_t b, float64x2_t c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_high_laneq_s16 (int16x8_t __a, int16x8_t __b, int const __c)
+ {
+- return a - b * c;
++ return __builtin_aarch64_sqdmull2_laneqv8hi (__a, __b,__c);
+ }
+
+-/* vmls_lane */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmls_lane_f32 (float32x2_t __a, float32x2_t __b,
+- float32x2_t __c, const int __lane)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_high_n_s16 (int16x8_t __a, int16_t __b)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull2_nv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmls_lane_s16 (int16x4_t __a, int16x4_t __b,
+- int16x4_t __c, const int __lane)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_lane_s16 (int16x4_t __a, int16x4_t __b, int const __c)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull_lanev4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmls_lane_s32 (int32x2_t __a, int32x2_t __b,
+- int32x2_t __c, const int __lane)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_laneq_s16 (int16x4_t __a, int16x8_t __b, int const __c)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull_laneqv4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmls_lane_u16 (uint16x4_t __a, uint16x4_t __b,
+- uint16x4_t __c, const int __lane)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_n_s16 (int16x4_t __a, int16_t __b)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull_nv4hi (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmls_lane_u32 (uint32x2_t __a, uint32x2_t __b,
+- uint32x2_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmullv2si (__a, __b);
+ }
+
+-/* vmls_laneq */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmls_laneq_f32 (float32x2_t __a, float32x2_t __b,
+- float32x4_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_high_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull2v4si (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmls_laneq_s16 (int16x4_t __a, int16x4_t __b,
+- int16x8_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_high_lane_s32 (int32x4_t __a, int32x2_t __b, int const __c)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull2_lanev4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmls_laneq_s32 (int32x2_t __a, int32x2_t __b,
+- int32x4_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_high_laneq_s32 (int32x4_t __a, int32x4_t __b, int const __c)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull2_laneqv4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmls_laneq_u16 (uint16x4_t __a, uint16x4_t __b,
+- uint16x8_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_high_n_s32 (int32x4_t __a, int32_t __b)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull2_nv4si (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmls_laneq_u32 (uint32x2_t __a, uint32x2_t __b,
+- uint32x4_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_lane_s32 (int32x2_t __a, int32x2_t __b, int const __c)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull_lanev2si (__a, __b, __c);
+ }
+
+-/* vmlsq_lane */
+-
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmlsq_lane_f32 (float32x4_t __a, float32x4_t __b,
+- float32x2_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_laneq_s32 (int32x2_t __a, int32x4_t __b, int const __c)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull_laneqv2si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlsq_lane_s16 (int16x8_t __a, int16x8_t __b,
+- int16x4_t __c, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmull_n_s32 (int32x2_t __a, int32_t __b)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull_nv2si (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlsq_lane_s32 (int32x4_t __a, int32x4_t __b,
+- int32x2_t __c, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmullh_s16 (int16_t __a, int16_t __b)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return (int32_t) __builtin_aarch64_sqdmullhi (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlsq_lane_u16 (uint16x8_t __a, uint16x8_t __b,
+- uint16x4_t __c, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmullh_lane_s16 (int16_t __a, int16x4_t __b, const int __c)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull_lanehi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlsq_lane_u32 (uint32x4_t __a, uint32x4_t __b,
+- uint32x2_t __c, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmullh_laneq_s16 (int16_t __a, int16x8_t __b, const int __c)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull_laneqhi (__a, __b, __c);
+ }
+
+- /* vmlsq_laneq */
+-
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmlsq_laneq_f32 (float32x4_t __a, float32x4_t __b,
+- float32x4_t __c, const int __lane)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulls_s32 (int32_t __a, int32_t __b)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmullsi (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmlsq_laneq_s16 (int16x8_t __a, int16x8_t __b,
+- int16x8_t __c, const int __lane)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulls_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull_lanesi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmlsq_laneq_s32 (int32x4_t __a, int32x4_t __b,
+- int32x4_t __c, const int __lane)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqdmulls_laneq_s32 (int32_t __a, int32x4_t __b, const int __c)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return __builtin_aarch64_sqdmull_laneqsi (__a, __b, __c);
+ }
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmlsq_laneq_u16 (uint16x8_t __a, uint16x8_t __b,
+- uint16x8_t __c, const int __lane)
++
++/* vqmovn */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_s16 (int16x8_t __a)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return (int8x8_t) __builtin_aarch64_sqmovnv8hi (__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmlsq_laneq_u32 (uint32x4_t __a, uint32x4_t __b,
+- uint32x4_t __c, const int __lane)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_s32 (int32x4_t __a)
+ {
+- return (__a - (__b * __aarch64_vget_lane_any (__c, __lane)));
++ return (int16x4_t) __builtin_aarch64_sqmovnv4si (__a);
+ }
+
+-/* vmov_n_ */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmov_n_f32 (float32_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_s64 (int64x2_t __a)
+ {
+- return vdup_n_f32 (__a);
++ return (int32x2_t) __builtin_aarch64_sqmovnv2di (__a);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vmov_n_f64 (float64_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_u16 (uint16x8_t __a)
+ {
+- return (float64x1_t) {__a};
++ return (uint8x8_t) __builtin_aarch64_uqmovnv8hi ((int16x8_t) __a);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vmov_n_p8 (poly8_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_u32 (uint32x4_t __a)
+ {
+- return vdup_n_p8 (__a);
++ return (uint16x4_t) __builtin_aarch64_uqmovnv4si ((int32x4_t) __a);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vmov_n_p16 (poly16_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovn_u64 (uint64x2_t __a)
+ {
+- return vdup_n_p16 (__a);
++ return (uint32x2_t) __builtin_aarch64_uqmovnv2di ((int64x2_t) __a);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vmov_n_s8 (int8_t __a)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovnh_s16 (int16_t __a)
+ {
+- return vdup_n_s8 (__a);
++ return (int8_t) __builtin_aarch64_sqmovnhi (__a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmov_n_s16 (int16_t __a)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovns_s32 (int32_t __a)
+ {
+- return vdup_n_s16 (__a);
++ return (int16_t) __builtin_aarch64_sqmovnsi (__a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmov_n_s32 (int32_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovnd_s64 (int64_t __a)
+ {
+- return vdup_n_s32 (__a);
++ return (int32_t) __builtin_aarch64_sqmovndi (__a);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vmov_n_s64 (int64_t __a)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovnh_u16 (uint16_t __a)
+ {
+- return (int64x1_t) {__a};
++ return (uint8_t) __builtin_aarch64_uqmovnhi (__a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vmov_n_u8 (uint8_t __a)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovns_u32 (uint32_t __a)
+ {
+- return vdup_n_u8 (__a);
++ return (uint16_t) __builtin_aarch64_uqmovnsi (__a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmov_n_u16 (uint16_t __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovnd_u64 (uint64_t __a)
+ {
+- return vdup_n_u16 (__a);
++ return (uint32_t) __builtin_aarch64_uqmovndi (__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmov_n_u32 (uint32_t __a)
++/* vqmovun */
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovun_s16 (int16x8_t __a)
+ {
+- return vdup_n_u32 (__a);
++ return (uint8x8_t) __builtin_aarch64_sqmovunv8hi (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vmov_n_u64 (uint64_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovun_s32 (int32x4_t __a)
+ {
+- return (uint64x1_t) {__a};
++ return (uint16x4_t) __builtin_aarch64_sqmovunv4si (__a);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmovq_n_f32 (float32_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovun_s64 (int64x2_t __a)
+ {
+- return vdupq_n_f32 (__a);
++ return (uint32x2_t) __builtin_aarch64_sqmovunv2di (__a);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmovq_n_f64 (float64_t __a)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovunh_s16 (int16_t __a)
+ {
+- return vdupq_n_f64 (__a);
++ return (int8_t) __builtin_aarch64_sqmovunhi (__a);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vmovq_n_p8 (poly8_t __a)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovuns_s32 (int32_t __a)
+ {
+- return vdupq_n_p8 (__a);
++ return (int16_t) __builtin_aarch64_sqmovunsi (__a);
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vmovq_n_p16 (poly16_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqmovund_s64 (int64_t __a)
+ {
+- return vdupq_n_p16 (__a);
++ return (int32_t) __builtin_aarch64_sqmovundi (__a);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vmovq_n_s8 (int8_t __a)
++/* vqneg */
++
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqnegq_s64 (int64x2_t __a)
+ {
+- return vdupq_n_s8 (__a);
++ return (int64x2_t) __builtin_aarch64_sqnegv2di (__a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmovq_n_s16 (int16_t __a)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqnegb_s8 (int8_t __a)
+ {
+- return vdupq_n_s16 (__a);
++ return (int8_t) __builtin_aarch64_sqnegqi (__a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmovq_n_s32 (int32_t __a)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqnegh_s16 (int16_t __a)
+ {
+- return vdupq_n_s32 (__a);
++ return (int16_t) __builtin_aarch64_sqneghi (__a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vmovq_n_s64 (int64_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqnegs_s32 (int32_t __a)
+ {
+- return vdupq_n_s64 (__a);
++ return (int32_t) __builtin_aarch64_sqnegsi (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vmovq_n_u8 (uint8_t __a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqnegd_s64 (int64_t __a)
+ {
+- return vdupq_n_u8 (__a);
++ return __builtin_aarch64_sqnegdi (__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmovq_n_u16 (uint16_t __a)
++/* vqrdmulh */
++
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulh_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+ {
+- return vdupq_n_u16 (__a);
++ return __builtin_aarch64_sqrdmulh_lanev4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmovq_n_u32 (uint32_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+ {
+- return vdupq_n_u32 (__a);
++ return __builtin_aarch64_sqrdmulh_lanev2si (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vmovq_n_u64 (uint64_t __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __c)
+ {
+- return vdupq_n_u64 (__a);
++ return __builtin_aarch64_sqrdmulh_lanev8hi (__a, __b, __c);
+ }
+
+-/* vmul_lane */
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __c)
++{
++ return __builtin_aarch64_sqrdmulh_lanev4si (__a, __b, __c);
++}
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmul_lane_f32 (float32x2_t __a, float32x2_t __b, const int __lane)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhh_s16 (int16_t __a, int16_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return (int16_t) __builtin_aarch64_sqrdmulhhi (__a, __b);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vmul_lane_f64 (float64x1_t __a, float64x1_t __b, const int __lane)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhh_lane_s16 (int16_t __a, int16x4_t __b, const int __c)
+ {
+- return __a * __b;
++ return __builtin_aarch64_sqrdmulh_lanehi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmul_lane_s16 (int16x4_t __a, int16x4_t __b, const int __lane)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhh_laneq_s16 (int16_t __a, int16x8_t __b, const int __c)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrdmulh_laneqhi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmul_lane_s32 (int32x2_t __a, int32x2_t __b, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhs_s32 (int32_t __a, int32_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return (int32_t) __builtin_aarch64_sqrdmulhsi (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmul_lane_u16 (uint16x4_t __a, uint16x4_t __b, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhs_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrdmulh_lanesi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmul_lane_u32 (uint32x2_t __a, uint32x2_t __b, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrdmulhs_laneq_s32 (int32_t __a, int32x4_t __b, const int __c)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrdmulh_laneqsi (__a, __b, __c);
+ }
+
+-/* vmuld_lane */
++/* vqrshl */
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vmuld_lane_f64 (float64_t __a, float64x1_t __b, const int __lane)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshl_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrshlv8qi (__a, __b);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vmuld_laneq_f64 (float64_t __a, float64x2_t __b, const int __lane)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshl_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrshlv4hi (__a, __b);
+ }
+
+-/* vmuls_lane */
+-
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vmuls_lane_f32 (float32_t __a, float32x2_t __b, const int __lane)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshl_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrshlv2si (__a, __b);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vmuls_laneq_f32 (float32_t __a, float32x4_t __b, const int __lane)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshl_s64 (int64x1_t __a, int64x1_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return (int64x1_t) {__builtin_aarch64_sqrshldi (__a[0], __b[0])};
+ }
+
+-/* vmul_laneq */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vmul_laneq_f32 (float32x2_t __a, float32x4_t __b, const int __lane)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshl_u8 (uint8x8_t __a, int8x8_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_uqrshlv8qi_uus ( __a, __b);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vmul_laneq_f64 (float64x1_t __a, float64x2_t __b, const int __lane)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshl_u16 (uint16x4_t __a, int16x4_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_uqrshlv4hi_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vmul_laneq_s16 (int16x4_t __a, int16x8_t __b, const int __lane)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshl_u32 (uint32x2_t __a, int32x2_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_uqrshlv2si_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vmul_laneq_s32 (int32x2_t __a, int32x4_t __b, const int __lane)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshl_u64 (uint64x1_t __a, int64x1_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return (uint64x1_t) {__builtin_aarch64_uqrshldi_uus (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vmul_laneq_u16 (uint16x4_t __a, uint16x8_t __b, const int __lane)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrshlv16qi (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vmul_laneq_u32 (uint32x2_t __a, uint32x4_t __b, const int __lane)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrshlv8hi (__a, __b);
+ }
+
+-/* vmul_n */
+-
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vmul_n_f64 (float64x1_t __a, float64_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- return (float64x1_t) { vget_lane_f64 (__a, 0) * __b };
++ return __builtin_aarch64_sqrshlv4si (__a, __b);
+ }
+
+-/* vmulq_lane */
+-
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmulq_lane_f32 (float32x4_t __a, float32x2_t __b, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrshlv2di (__a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmulq_lane_f64 (float64x2_t __a, float64x1_t __b, const int __lane)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlq_u8 (uint8x16_t __a, int8x16_t __b)
+ {
+- __AARCH64_LANE_CHECK (__a, __lane);
+- return __a * __b[0];
++ return __builtin_aarch64_uqrshlv16qi_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmulq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __lane)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlq_u16 (uint16x8_t __a, int16x8_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_uqrshlv8hi_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmulq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __lane)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlq_u32 (uint32x4_t __a, int32x4_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_uqrshlv4si_uus ( __a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmulq_lane_u16 (uint16x8_t __a, uint16x4_t __b, const int __lane)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlq_u64 (uint64x2_t __a, int64x2_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_uqrshlv2di_uus ( __a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmulq_lane_u32 (uint32x4_t __a, uint32x2_t __b, const int __lane)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlb_s8 (int8_t __a, int8_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrshlqi (__a, __b);
+ }
+
+-/* vmulq_laneq */
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlh_s16 (int16_t __a, int16_t __b)
++{
++ return __builtin_aarch64_sqrshlhi (__a, __b);
++}
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vmulq_laneq_f32 (float32x4_t __a, float32x4_t __b, const int __lane)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshls_s32 (int32_t __a, int32_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrshlsi (__a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vmulq_laneq_f64 (float64x2_t __a, float64x2_t __b, const int __lane)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshld_s64 (int64_t __a, int64_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_sqrshldi (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vmulq_laneq_s16 (int16x8_t __a, int16x8_t __b, const int __lane)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlb_u8 (uint8_t __a, uint8_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_uqrshlqi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vmulq_laneq_s32 (int32x4_t __a, int32x4_t __b, const int __lane)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshlh_u16 (uint16_t __a, uint16_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_uqrshlhi_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vmulq_laneq_u16 (uint16x8_t __a, uint16x8_t __b, const int __lane)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshls_u32 (uint32_t __a, uint32_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_uqrshlsi_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vmulq_laneq_u32 (uint32x4_t __a, uint32x4_t __b, const int __lane)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshld_u64 (uint64_t __a, uint64_t __b)
+ {
+- return __a * __aarch64_vget_lane_any (__b, __lane);
++ return __builtin_aarch64_uqrshldi_uus (__a, __b);
+ }
+
+-/* vneg */
++/* vqrshrn */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vneg_f32 (float32x2_t __a)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrn_n_s16 (int16x8_t __a, const int __b)
+ {
+- return -__a;
++ return (int8x8_t) __builtin_aarch64_sqrshrn_nv8hi (__a, __b);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vneg_f64 (float64x1_t __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrn_n_s32 (int32x4_t __a, const int __b)
+ {
+- return -__a;
++ return (int16x4_t) __builtin_aarch64_sqrshrn_nv4si (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vneg_s8 (int8x8_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrn_n_s64 (int64x2_t __a, const int __b)
+ {
+- return -__a;
++ return (int32x2_t) __builtin_aarch64_sqrshrn_nv2di (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vneg_s16 (int16x4_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrn_n_u16 (uint16x8_t __a, const int __b)
+ {
+- return -__a;
++ return __builtin_aarch64_uqrshrn_nv8hi_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vneg_s32 (int32x2_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrn_n_u32 (uint32x4_t __a, const int __b)
+ {
+- return -__a;
++ return __builtin_aarch64_uqrshrn_nv4si_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vneg_s64 (int64x1_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrn_n_u64 (uint64x2_t __a, const int __b)
+ {
+- return -__a;
++ return __builtin_aarch64_uqrshrn_nv2di_uus ( __a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vnegq_f32 (float32x4_t __a)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrnh_n_s16 (int16_t __a, const int __b)
+ {
+- return -__a;
++ return (int8_t) __builtin_aarch64_sqrshrn_nhi (__a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vnegq_f64 (float64x2_t __a)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrns_n_s32 (int32_t __a, const int __b)
+ {
+- return -__a;
++ return (int16_t) __builtin_aarch64_sqrshrn_nsi (__a, __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vnegq_s8 (int8x16_t __a)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrnd_n_s64 (int64_t __a, const int __b)
+ {
+- return -__a;
++ return (int32_t) __builtin_aarch64_sqrshrn_ndi (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vnegq_s16 (int16x8_t __a)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrnh_n_u16 (uint16_t __a, const int __b)
+ {
+- return -__a;
++ return __builtin_aarch64_uqrshrn_nhi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vnegq_s32 (int32x4_t __a)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrns_n_u32 (uint32_t __a, const int __b)
+ {
+- return -__a;
++ return __builtin_aarch64_uqrshrn_nsi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vnegq_s64 (int64x2_t __a)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrnd_n_u64 (uint64_t __a, const int __b)
+ {
+- return -__a;
++ return __builtin_aarch64_uqrshrn_ndi_uus (__a, __b);
+ }
+
+-/* vpadd */
++/* vqrshrun */
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vpadd_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrun_n_s16 (int16x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_addpv8qi (__a, __b);
++ return (uint8x8_t) __builtin_aarch64_sqrshrun_nv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vpadd_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrun_n_s32 (int32x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_addpv4hi (__a, __b);
++ return (uint16x4_t) __builtin_aarch64_sqrshrun_nv4si (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vpadd_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrun_n_s64 (int64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_addpv2si (__a, __b);
++ return (uint32x2_t) __builtin_aarch64_sqrshrun_nv2di (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vpadd_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrunh_n_s16 (int16_t __a, const int __b)
+ {
+- return (uint8x8_t) __builtin_aarch64_addpv8qi ((int8x8_t) __a,
+- (int8x8_t) __b);
++ return (int8_t) __builtin_aarch64_sqrshrun_nhi (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vpadd_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshruns_n_s32 (int32_t __a, const int __b)
+ {
+- return (uint16x4_t) __builtin_aarch64_addpv4hi ((int16x4_t) __a,
+- (int16x4_t) __b);
++ return (int16_t) __builtin_aarch64_sqrshrun_nsi (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vpadd_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqrshrund_n_s64 (int64_t __a, const int __b)
+ {
+- return (uint32x2_t) __builtin_aarch64_addpv2si ((int32x2_t) __a,
+- (int32x2_t) __b);
++ return (int32_t) __builtin_aarch64_sqrshrun_ndi (__a, __b);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vpaddd_f64 (float64x2_t __a)
+-{
+- return __builtin_aarch64_reduc_plus_scal_v2df (__a);
+-}
++/* vqshl */
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vpaddd_s64 (int64x2_t __a)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- return __builtin_aarch64_addpdi (__a);
++ return __builtin_aarch64_sqshlv8qi (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vpaddd_u64 (uint64x2_t __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- return __builtin_aarch64_addpdi ((int64x2_t) __a);
++ return __builtin_aarch64_sqshlv4hi (__a, __b);
+ }
+
+-/* vqabs */
+-
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqabsq_s64 (int64x2_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- return (int64x2_t) __builtin_aarch64_sqabsv2di (__a);
++ return __builtin_aarch64_sqshlv2si (__a, __b);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqabsb_s8 (int8_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_s64 (int64x1_t __a, int64x1_t __b)
+ {
+- return (int8_t) __builtin_aarch64_sqabsqi (__a);
++ return (int64x1_t) {__builtin_aarch64_sqshldi (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqabsh_s16 (int16_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_u8 (uint8x8_t __a, int8x8_t __b)
+ {
+- return (int16_t) __builtin_aarch64_sqabshi (__a);
++ return __builtin_aarch64_uqshlv8qi_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqabss_s32 (int32_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_u16 (uint16x4_t __a, int16x4_t __b)
+ {
+- return (int32_t) __builtin_aarch64_sqabssi (__a);
++ return __builtin_aarch64_uqshlv4hi_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqabsd_s64 (int64_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_u32 (uint32x2_t __a, int32x2_t __b)
+ {
+- return __builtin_aarch64_sqabsdi (__a);
++ return __builtin_aarch64_uqshlv2si_uus ( __a, __b);
+ }
+
+-/* vqadd */
+-
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqaddb_s8 (int8_t __a, int8_t __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_u64 (uint64x1_t __a, int64x1_t __b)
+ {
+- return (int8_t) __builtin_aarch64_sqaddqi (__a, __b);
++ return (uint64x1_t) {__builtin_aarch64_uqshldi_uus (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqaddh_s16 (int16_t __a, int16_t __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- return (int16_t) __builtin_aarch64_sqaddhi (__a, __b);
++ return __builtin_aarch64_sqshlv16qi (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqadds_s32 (int32_t __a, int32_t __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- return (int32_t) __builtin_aarch64_sqaddsi (__a, __b);
++ return __builtin_aarch64_sqshlv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqaddd_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- return __builtin_aarch64_sqadddi (__a, __b);
++ return __builtin_aarch64_sqshlv4si (__a, __b);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vqaddb_u8 (uint8_t __a, uint8_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- return (uint8_t) __builtin_aarch64_uqaddqi_uuu (__a, __b);
++ return __builtin_aarch64_sqshlv2di (__a, __b);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vqaddh_u16 (uint16_t __a, uint16_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_u8 (uint8x16_t __a, int8x16_t __b)
+ {
+- return (uint16_t) __builtin_aarch64_uqaddhi_uuu (__a, __b);
++ return __builtin_aarch64_uqshlv16qi_uus ( __a, __b);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vqadds_u32 (uint32_t __a, uint32_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_u16 (uint16x8_t __a, int16x8_t __b)
+ {
+- return (uint32_t) __builtin_aarch64_uqaddsi_uuu (__a, __b);
++ return __builtin_aarch64_uqshlv8hi_uus ( __a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vqaddd_u64 (uint64_t __a, uint64_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_u32 (uint32x4_t __a, int32x4_t __b)
+ {
+- return __builtin_aarch64_uqadddi_uuu (__a, __b);
++ return __builtin_aarch64_uqshlv4si_uus ( __a, __b);
+ }
+
+-/* vqdmlal */
+-
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlal_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_u64 (uint64x2_t __a, int64x2_t __b)
+ {
+- return __builtin_aarch64_sqdmlalv4hi (__a, __b, __c);
++ return __builtin_aarch64_uqshlv2di_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlal_high_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlb_s8 (int8_t __a, int8_t __b)
+ {
+- return __builtin_aarch64_sqdmlal2v8hi (__a, __b, __c);
++ return __builtin_aarch64_sqshlqi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlal_high_lane_s16 (int32x4_t __a, int16x8_t __b, int16x4_t __c,
+- int const __d)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlh_s16 (int16_t __a, int16_t __b)
+ {
+- return __builtin_aarch64_sqdmlal2_lanev8hi (__a, __b, __c, __d);
++ return __builtin_aarch64_sqshlhi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlal_high_laneq_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c,
+- int const __d)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshls_s32 (int32_t __a, int32_t __b)
+ {
+- return __builtin_aarch64_sqdmlal2_laneqv8hi (__a, __b, __c, __d);
++ return __builtin_aarch64_sqshlsi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlal_high_n_s16 (int32x4_t __a, int16x8_t __b, int16_t __c)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshld_s64 (int64_t __a, int64_t __b)
+ {
+- return __builtin_aarch64_sqdmlal2_nv8hi (__a, __b, __c);
++ return __builtin_aarch64_sqshldi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlal_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, int const __d)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlb_u8 (uint8_t __a, uint8_t __b)
+ {
+- return __builtin_aarch64_sqdmlal_lanev4hi (__a, __b, __c, __d);
++ return __builtin_aarch64_uqshlqi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlal_laneq_s16 (int32x4_t __a, int16x4_t __b, int16x8_t __c, int const __d)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlh_u16 (uint16_t __a, uint16_t __b)
+ {
+- return __builtin_aarch64_sqdmlal_laneqv4hi (__a, __b, __c, __d);
++ return __builtin_aarch64_uqshlhi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlal_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshls_u32 (uint32_t __a, uint32_t __b)
+ {
+- return __builtin_aarch64_sqdmlal_nv4hi (__a, __b, __c);
++ return __builtin_aarch64_uqshlsi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlal_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshld_u64 (uint64_t __a, uint64_t __b)
+ {
+- return __builtin_aarch64_sqdmlalv2si (__a, __b, __c);
++ return __builtin_aarch64_uqshldi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlal_high_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_n_s8 (int8x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal2v4si (__a, __b, __c);
++ return (int8x8_t) __builtin_aarch64_sqshl_nv8qi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlal_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x2_t __c,
+- int const __d)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_n_s16 (int16x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal2_lanev4si (__a, __b, __c, __d);
++ return (int16x4_t) __builtin_aarch64_sqshl_nv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlal_high_laneq_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c,
+- int const __d)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_n_s32 (int32x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal2_laneqv4si (__a, __b, __c, __d);
++ return (int32x2_t) __builtin_aarch64_sqshl_nv2si (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlal_high_n_s32 (int64x2_t __a, int32x4_t __b, int32_t __c)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_n_s64 (int64x1_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal2_nv4si (__a, __b, __c);
++ return (int64x1_t) {__builtin_aarch64_sqshl_ndi (__a[0], __b)};
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlal_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, int const __d)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_n_u8 (uint8x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal_lanev2si (__a, __b, __c, __d);
++ return __builtin_aarch64_uqshl_nv8qi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlal_laneq_s32 (int64x2_t __a, int32x2_t __b, int32x4_t __c, int const __d)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_n_u16 (uint16x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal_laneqv2si (__a, __b, __c, __d);
++ return __builtin_aarch64_uqshl_nv4hi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlal_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_n_u32 (uint32x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal_nv2si (__a, __b, __c);
++ return __builtin_aarch64_uqshl_nv2si_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmlalh_s16 (int32_t __a, int16_t __b, int16_t __c)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshl_n_u64 (uint64x1_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlalhi (__a, __b, __c);
++ return (uint64x1_t) {__builtin_aarch64_uqshl_ndi_uus (__a[0], __b)};
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmlalh_lane_s16 (int32_t __a, int16_t __b, int16x4_t __c, const int __d)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_n_s8 (int8x16_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal_lanehi (__a, __b, __c, __d);
++ return (int8x16_t) __builtin_aarch64_sqshl_nv16qi (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmlalh_laneq_s16 (int32_t __a, int16_t __b, int16x8_t __c, const int __d)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_n_s16 (int16x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal_laneqhi (__a, __b, __c, __d);
++ return (int16x8_t) __builtin_aarch64_sqshl_nv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqdmlals_s32 (int64_t __a, int32_t __b, int32_t __c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_n_s32 (int32x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlalsi (__a, __b, __c);
++ return (int32x4_t) __builtin_aarch64_sqshl_nv4si (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqdmlals_lane_s32 (int64_t __a, int32_t __b, int32x2_t __c, const int __d)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_n_s64 (int64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal_lanesi (__a, __b, __c, __d);
++ return (int64x2_t) __builtin_aarch64_sqshl_nv2di (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqdmlals_laneq_s32 (int64_t __a, int32_t __b, int32x4_t __c, const int __d)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_n_u8 (uint8x16_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlal_laneqsi (__a, __b, __c, __d);
++ return __builtin_aarch64_uqshl_nv16qi_uus (__a, __b);
+ }
+
+-/* vqdmlsl */
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_n_u16 (uint16x8_t __a, const int __b)
++{
++ return __builtin_aarch64_uqshl_nv8hi_uus (__a, __b);
++}
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlsl_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_n_u32 (uint32x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlslv4hi (__a, __b, __c);
++ return __builtin_aarch64_uqshl_nv4si_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlsl_high_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlq_n_u64 (uint64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl2v8hi (__a, __b, __c);
++ return __builtin_aarch64_uqshl_nv2di_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlsl_high_lane_s16 (int32x4_t __a, int16x8_t __b, int16x4_t __c,
+- int const __d)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlb_n_s8 (int8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl2_lanev8hi (__a, __b, __c, __d);
++ return (int8_t) __builtin_aarch64_sqshl_nqi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlsl_high_laneq_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c,
+- int const __d)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlh_n_s16 (int16_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl2_laneqv8hi (__a, __b, __c, __d);
++ return (int16_t) __builtin_aarch64_sqshl_nhi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlsl_high_n_s16 (int32x4_t __a, int16x8_t __b, int16_t __c)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshls_n_s32 (int32_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl2_nv8hi (__a, __b, __c);
++ return (int32_t) __builtin_aarch64_sqshl_nsi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlsl_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, int const __d)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshld_n_s64 (int64_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl_lanev4hi (__a, __b, __c, __d);
++ return __builtin_aarch64_sqshl_ndi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlsl_laneq_s16 (int32x4_t __a, int16x4_t __b, int16x8_t __c, int const __d)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlb_n_u8 (uint8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl_laneqv4hi (__a, __b, __c, __d);
++ return __builtin_aarch64_uqshl_nqi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmlsl_n_s16 (int32x4_t __a, int16x4_t __b, int16_t __c)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlh_n_u16 (uint16_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl_nv4hi (__a, __b, __c);
++ return __builtin_aarch64_uqshl_nhi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlsl_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshls_n_u32 (uint32_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlslv2si (__a, __b, __c);
++ return __builtin_aarch64_uqshl_nsi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlsl_high_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshld_n_u64 (uint64_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl2v4si (__a, __b, __c);
++ return __builtin_aarch64_uqshl_ndi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlsl_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x2_t __c,
+- int const __d)
++/* vqshlu */
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlu_n_s8 (int8x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl2_lanev4si (__a, __b, __c, __d);
++ return __builtin_aarch64_sqshlu_nv8qi_uss (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlsl_high_laneq_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c,
+- int const __d)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlu_n_s16 (int16x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl2_laneqv4si (__a, __b, __c, __d);
++ return __builtin_aarch64_sqshlu_nv4hi_uss (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlsl_high_n_s32 (int64x2_t __a, int32x4_t __b, int32_t __c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlu_n_s32 (int32x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl2_nv4si (__a, __b, __c);
++ return __builtin_aarch64_sqshlu_nv2si_uss (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlsl_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, int const __d)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlu_n_s64 (int64x1_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl_lanev2si (__a, __b, __c, __d);
++ return (uint64x1_t) {__builtin_aarch64_sqshlu_ndi_uss (__a[0], __b)};
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlsl_laneq_s32 (int64x2_t __a, int32x2_t __b, int32x4_t __c, int const __d)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshluq_n_s8 (int8x16_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl_laneqv2si (__a, __b, __c, __d);
++ return __builtin_aarch64_sqshlu_nv16qi_uss (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmlsl_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshluq_n_s16 (int16x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl_nv2si (__a, __b, __c);
++ return __builtin_aarch64_sqshlu_nv8hi_uss (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmlslh_s16 (int32_t __a, int16_t __b, int16_t __c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshluq_n_s32 (int32x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlslhi (__a, __b, __c);
++ return __builtin_aarch64_sqshlu_nv4si_uss (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmlslh_lane_s16 (int32_t __a, int16_t __b, int16x4_t __c, const int __d)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshluq_n_s64 (int64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl_lanehi (__a, __b, __c, __d);
++ return __builtin_aarch64_sqshlu_nv2di_uss (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmlslh_laneq_s16 (int32_t __a, int16_t __b, int16x8_t __c, const int __d)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlub_n_s8 (int8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl_laneqhi (__a, __b, __c, __d);
++ return (int8_t) __builtin_aarch64_sqshlu_nqi_uss (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqdmlsls_s32 (int64_t __a, int32_t __b, int32_t __c)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshluh_n_s16 (int16_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlslsi (__a, __b, __c);
++ return (int16_t) __builtin_aarch64_sqshlu_nhi_uss (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqdmlsls_lane_s32 (int64_t __a, int32_t __b, int32x2_t __c, const int __d)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlus_n_s32 (int32_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl_lanesi (__a, __b, __c, __d);
++ return (int32_t) __builtin_aarch64_sqshlu_nsi_uss (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqdmlsls_laneq_s32 (int64_t __a, int32_t __b, int32x4_t __c, const int __d)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshlud_n_s64 (int64_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmlsl_laneqsi (__a, __b, __c, __d);
++ return __builtin_aarch64_sqshlu_ndi_uss (__a, __b);
+ }
+
+-/* vqdmulh */
++/* vqshrn */
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqdmulh_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrn_n_s16 (int16x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmulh_lanev4hi (__a, __b, __c);
++ return (int8x8_t) __builtin_aarch64_sqshrn_nv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrn_n_s32 (int32x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmulh_lanev2si (__a, __b, __c);
++ return (int16x4_t) __builtin_aarch64_sqshrn_nv4si (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqdmulhq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrn_n_s64 (int64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmulh_lanev8hi (__a, __b, __c);
++ return (int32x2_t) __builtin_aarch64_sqshrn_nv2di (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmulhq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrn_n_u16 (uint16x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmulh_lanev4si (__a, __b, __c);
++ return __builtin_aarch64_uqshrn_nv8hi_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqdmulhh_s16 (int16_t __a, int16_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrn_n_u32 (uint32x4_t __a, const int __b)
+ {
+- return (int16_t) __builtin_aarch64_sqdmulhhi (__a, __b);
++ return __builtin_aarch64_uqshrn_nv4si_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqdmulhh_lane_s16 (int16_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrn_n_u64 (uint64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmulh_lanehi (__a, __b, __c);
++ return __builtin_aarch64_uqshrn_nv2di_uus ( __a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqdmulhh_laneq_s16 (int16_t __a, int16x8_t __b, const int __c)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrnh_n_s16 (int16_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmulh_laneqhi (__a, __b, __c);
++ return (int8_t) __builtin_aarch64_sqshrn_nhi (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmulhs_s32 (int32_t __a, int32_t __b)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrns_n_s32 (int32_t __a, const int __b)
+ {
+- return (int32_t) __builtin_aarch64_sqdmulhsi (__a, __b);
++ return (int16_t) __builtin_aarch64_sqshrn_nsi (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmulhs_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrnd_n_s64 (int64_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmulh_lanesi (__a, __b, __c);
++ return (int32_t) __builtin_aarch64_sqshrn_ndi (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmulhs_laneq_s32 (int32_t __a, int32x4_t __b, const int __c)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrnh_n_u16 (uint16_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmulh_laneqsi (__a, __b, __c);
++ return __builtin_aarch64_uqshrn_nhi_uus (__a, __b);
+ }
+
+-/* vqdmull */
+-
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmull_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrns_n_u32 (uint32_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmullv4hi (__a, __b);
++ return __builtin_aarch64_uqshrn_nsi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmull_high_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrnd_n_u64 (uint64_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmull2v8hi (__a, __b);
++ return __builtin_aarch64_uqshrn_ndi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmull_high_lane_s16 (int16x8_t __a, int16x4_t __b, int const __c)
++/* vqshrun */
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrun_n_s16 (int16x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmull2_lanev8hi (__a, __b,__c);
++ return (uint8x8_t) __builtin_aarch64_sqshrun_nv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmull_high_laneq_s16 (int16x8_t __a, int16x8_t __b, int const __c)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrun_n_s32 (int32x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmull2_laneqv8hi (__a, __b,__c);
++ return (uint16x4_t) __builtin_aarch64_sqshrun_nv4si (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmull_high_n_s16 (int16x8_t __a, int16_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrun_n_s64 (int64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmull2_nv8hi (__a, __b);
++ return (uint32x2_t) __builtin_aarch64_sqshrun_nv2di (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmull_lane_s16 (int16x4_t __a, int16x4_t __b, int const __c)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrunh_n_s16 (int16_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmull_lanev4hi (__a, __b, __c);
++ return (int8_t) __builtin_aarch64_sqshrun_nhi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmull_laneq_s16 (int16x4_t __a, int16x8_t __b, int const __c)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshruns_n_s32 (int32_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmull_laneqv4hi (__a, __b, __c);
++ return (int16_t) __builtin_aarch64_sqshrun_nsi (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqdmull_n_s16 (int16x4_t __a, int16_t __b)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqshrund_n_s64 (int64_t __a, const int __b)
+ {
+- return __builtin_aarch64_sqdmull_nv4hi (__a, __b);
++ return (int32_t) __builtin_aarch64_sqshrun_ndi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmull_s32 (int32x2_t __a, int32x2_t __b)
++/* vqsub */
++
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqsubb_s8 (int8_t __a, int8_t __b)
+ {
+- return __builtin_aarch64_sqdmullv2si (__a, __b);
++ return (int8_t) __builtin_aarch64_sqsubqi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmull_high_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqsubh_s16 (int16_t __a, int16_t __b)
+ {
+- return __builtin_aarch64_sqdmull2v4si (__a, __b);
++ return (int16_t) __builtin_aarch64_sqsubhi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmull_high_lane_s32 (int32x4_t __a, int32x2_t __b, int const __c)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqsubs_s32 (int32_t __a, int32_t __b)
+ {
+- return __builtin_aarch64_sqdmull2_lanev4si (__a, __b, __c);
++ return (int32_t) __builtin_aarch64_sqsubsi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmull_high_laneq_s32 (int32x4_t __a, int32x4_t __b, int const __c)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqsubd_s64 (int64_t __a, int64_t __b)
+ {
+- return __builtin_aarch64_sqdmull2_laneqv4si (__a, __b, __c);
++ return __builtin_aarch64_sqsubdi (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmull_high_n_s32 (int32x4_t __a, int32_t __b)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqsubb_u8 (uint8_t __a, uint8_t __b)
+ {
+- return __builtin_aarch64_sqdmull2_nv4si (__a, __b);
++ return (uint8_t) __builtin_aarch64_uqsubqi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmull_lane_s32 (int32x2_t __a, int32x2_t __b, int const __c)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqsubh_u16 (uint16_t __a, uint16_t __b)
+ {
+- return __builtin_aarch64_sqdmull_lanev2si (__a, __b, __c);
++ return (uint16_t) __builtin_aarch64_uqsubhi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmull_laneq_s32 (int32x2_t __a, int32x4_t __b, int const __c)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqsubs_u32 (uint32_t __a, uint32_t __b)
+ {
+- return __builtin_aarch64_sqdmull_laneqv2si (__a, __b, __c);
++ return (uint32_t) __builtin_aarch64_uqsubsi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqdmull_n_s32 (int32x2_t __a, int32_t __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqsubd_u64 (uint64_t __a, uint64_t __b)
+ {
+- return __builtin_aarch64_sqdmull_nv2si (__a, __b);
++ return __builtin_aarch64_uqsubdi_uuu (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmullh_s16 (int16_t __a, int16_t __b)
++/* vqtbl2 */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl2_s8 (int8x16x2_t tab, uint8x8_t idx)
+ {
+- return (int32_t) __builtin_aarch64_sqdmullhi (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[1], 1);
++ return __builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmullh_lane_s16 (int16_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl2_u8 (uint8x16x2_t tab, uint8x8_t idx)
+ {
+- return __builtin_aarch64_sqdmull_lanehi (__a, __b, __c);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ return (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqdmullh_laneq_s16 (int16_t __a, int16x8_t __b, const int __c)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl2_p8 (poly8x16x2_t tab, uint8x8_t idx)
+ {
+- return __builtin_aarch64_sqdmull_laneqhi (__a, __b, __c);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ return (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqdmulls_s32 (int32_t __a, int32_t __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl2q_s8 (int8x16x2_t tab, uint8x16_t idx)
+ {
+- return __builtin_aarch64_sqdmullsi (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ return __builtin_aarch64_tbl3v16qi (__o, (int8x16_t)idx);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqdmulls_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl2q_u8 (uint8x16x2_t tab, uint8x16_t idx)
+ {
+- return __builtin_aarch64_sqdmull_lanesi (__a, __b, __c);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ return (uint8x16_t)__builtin_aarch64_tbl3v16qi (__o, (int8x16_t)idx);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqdmulls_laneq_s32 (int32_t __a, int32x4_t __b, const int __c)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl2q_p8 (poly8x16x2_t tab, uint8x16_t idx)
+ {
+- return __builtin_aarch64_sqdmull_laneqsi (__a, __b, __c);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ return (poly8x16_t)__builtin_aarch64_tbl3v16qi (__o, (int8x16_t)idx);
+ }
+
+-/* vqmovn */
++/* vqtbl3 */
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqmovn_s16 (int16x8_t __a)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl3_s8 (int8x16x3_t tab, uint8x8_t idx)
+ {
+- return (int8x8_t) __builtin_aarch64_sqmovnv8hi (__a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
++ return __builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqmovn_s32 (int32x4_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl3_u8 (uint8x16x3_t tab, uint8x8_t idx)
+ {
+- return (int16x4_t) __builtin_aarch64_sqmovnv4si (__a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
++ return (uint8x8_t)__builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqmovn_s64 (int64x2_t __a)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl3_p8 (poly8x16x3_t tab, uint8x8_t idx)
+ {
+- return (int32x2_t) __builtin_aarch64_sqmovnv2di (__a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
++ return (poly8x8_t)__builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqmovn_u16 (uint16x8_t __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl3q_s8 (int8x16x3_t tab, uint8x16_t idx)
+ {
+- return (uint8x8_t) __builtin_aarch64_uqmovnv8hi ((int16x8_t) __a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
++ return __builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)idx);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vqmovn_u32 (uint32x4_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl3q_u8 (uint8x16x3_t tab, uint8x16_t idx)
+ {
+- return (uint16x4_t) __builtin_aarch64_uqmovnv4si ((int32x4_t) __a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
++ return (uint8x16_t)__builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)idx);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vqmovn_u64 (uint64x2_t __a)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl3q_p8 (poly8x16x3_t tab, uint8x16_t idx)
+ {
+- return (uint32x2_t) __builtin_aarch64_uqmovnv2di ((int64x2_t) __a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
++ return (poly8x16_t)__builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)idx);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqmovnh_s16 (int16_t __a)
++/* vqtbl4 */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl4_s8 (int8x16x4_t tab, uint8x8_t idx)
+ {
+- return (int8_t) __builtin_aarch64_sqmovnhi (__a);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
++ return __builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqmovns_s32 (int32_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl4_u8 (uint8x16x4_t tab, uint8x8_t idx)
+ {
+- return (int16_t) __builtin_aarch64_sqmovnsi (__a);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
++ return (uint8x8_t)__builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqmovnd_s64 (int64_t __a)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl4_p8 (poly8x16x4_t tab, uint8x8_t idx)
+ {
+- return (int32_t) __builtin_aarch64_sqmovndi (__a);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
++ return (poly8x8_t)__builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vqmovnh_u16 (uint16_t __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl4q_s8 (int8x16x4_t tab, uint8x16_t idx)
+ {
+- return (uint8_t) __builtin_aarch64_uqmovnhi (__a);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
++ return __builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)idx);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vqmovns_u32 (uint32_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl4q_u8 (uint8x16x4_t tab, uint8x16_t idx)
+ {
+- return (uint16_t) __builtin_aarch64_uqmovnsi (__a);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
++ return (uint8x16_t)__builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)idx);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vqmovnd_u64 (uint64_t __a)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbl4q_p8 (poly8x16x4_t tab, uint8x16_t idx)
+ {
+- return (uint32_t) __builtin_aarch64_uqmovndi (__a);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
++ return (poly8x16_t)__builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)idx);
+ }
+
+-/* vqmovun */
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqmovun_s16 (int16x8_t __a)
++/* vqtbx2 */
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx2_s8 (int8x8_t r, int8x16x2_t tab, uint8x8_t idx)
+ {
+- return (uint8x8_t) __builtin_aarch64_sqmovunv8hi (__a);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[1], 1);
++ return __builtin_aarch64_tbx4v8qi (r, __o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vqmovun_s32 (int32x4_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx2_u8 (uint8x8_t r, uint8x16x2_t tab, uint8x8_t idx)
+ {
+- return (uint16x4_t) __builtin_aarch64_sqmovunv4si (__a);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ return (uint8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)r, __o,
++ (int8x8_t)idx);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vqmovun_s64 (int64x2_t __a)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx2_p8 (poly8x8_t r, poly8x16x2_t tab, uint8x8_t idx)
+ {
+- return (uint32x2_t) __builtin_aarch64_sqmovunv2di (__a);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ return (poly8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)r, __o,
++ (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqmovunh_s16 (int16_t __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx2q_s8 (int8x16_t r, int8x16x2_t tab, uint8x16_t idx)
+ {
+- return (int8_t) __builtin_aarch64_sqmovunhi (__a);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[1], 1);
++ return __builtin_aarch64_tbx4v16qi (r, __o, (int8x16_t)idx);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqmovuns_s32 (int32_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx2q_u8 (uint8x16_t r, uint8x16x2_t tab, uint8x16_t idx)
+ {
+- return (int16_t) __builtin_aarch64_sqmovunsi (__a);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ return (uint8x16_t)__builtin_aarch64_tbx4v16qi ((int8x16_t)r, __o,
++ (int8x16_t)idx);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqmovund_s64 (int64_t __a)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx2q_p8 (poly8x16_t r, poly8x16x2_t tab, uint8x16_t idx)
+ {
+- return (int32_t) __builtin_aarch64_sqmovundi (__a);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ return (poly8x16_t)__builtin_aarch64_tbx4v16qi ((int8x16_t)r, __o,
++ (int8x16_t)idx);
+ }
+
+-/* vqneg */
++/* vqtbx3 */
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx3_s8 (int8x8_t r, int8x16x3_t tab, uint8x8_t idx)
++{
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[2], 2);
++ return __builtin_aarch64_qtbx3v8qi (r, __o, (int8x8_t)idx);
++}
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqnegq_s64 (int64x2_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx3_u8 (uint8x8_t r, uint8x16x3_t tab, uint8x8_t idx)
+ {
+- return (int64x2_t) __builtin_aarch64_sqnegv2di (__a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
++ return (uint8x8_t)__builtin_aarch64_qtbx3v8qi ((int8x8_t)r, __o,
++ (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqnegb_s8 (int8_t __a)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx3_p8 (poly8x8_t r, poly8x16x3_t tab, uint8x8_t idx)
+ {
+- return (int8_t) __builtin_aarch64_sqnegqi (__a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
++ return (poly8x8_t)__builtin_aarch64_qtbx3v8qi ((int8x8_t)r, __o,
++ (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqnegh_s16 (int16_t __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx3q_s8 (int8x16_t r, int8x16x3_t tab, uint8x16_t idx)
+ {
+- return (int16_t) __builtin_aarch64_sqneghi (__a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[2], 2);
++ return __builtin_aarch64_qtbx3v16qi (r, __o, (int8x16_t)idx);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqnegs_s32 (int32_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx3q_u8 (uint8x16_t r, uint8x16x3_t tab, uint8x16_t idx)
+ {
+- return (int32_t) __builtin_aarch64_sqnegsi (__a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
++ return (uint8x16_t)__builtin_aarch64_qtbx3v16qi ((int8x16_t)r, __o,
++ (int8x16_t)idx);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqnegd_s64 (int64_t __a)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx3q_p8 (poly8x16_t r, poly8x16x3_t tab, uint8x16_t idx)
+ {
+- return __builtin_aarch64_sqnegdi (__a);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
++ return (poly8x16_t)__builtin_aarch64_qtbx3v16qi ((int8x16_t)r, __o,
++ (int8x16_t)idx);
+ }
+
+-/* vqrdmulh */
++/* vqtbx4 */
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrdmulh_lane_s16 (int16x4_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx4_s8 (int8x8_t r, int8x16x4_t tab, uint8x8_t idx)
+ {
+- return __builtin_aarch64_sqrdmulh_lanev4hi (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[3], 3);
++ return __builtin_aarch64_qtbx4v8qi (r, __o, (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx4_u8 (uint8x8_t r, uint8x16x4_t tab, uint8x8_t idx)
+ {
+- return __builtin_aarch64_sqrdmulh_lanev2si (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
++ return (uint8x8_t)__builtin_aarch64_qtbx4v8qi ((int8x8_t)r, __o,
++ (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqrdmulhq_lane_s16 (int16x8_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx4_p8 (poly8x8_t r, poly8x16x4_t tab, uint8x8_t idx)
+ {
+- return __builtin_aarch64_sqrdmulh_lanev8hi (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
++ return (poly8x8_t)__builtin_aarch64_qtbx4v8qi ((int8x8_t)r, __o,
++ (int8x8_t)idx);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqrdmulhq_lane_s32 (int32x4_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx4q_s8 (int8x16_t r, int8x16x4_t tab, uint8x16_t idx)
+ {
+- return __builtin_aarch64_sqrdmulh_lanev4si (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[3], 3);
++ return __builtin_aarch64_qtbx4v16qi (r, __o, (int8x16_t)idx);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrdmulhh_s16 (int16_t __a, int16_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx4q_u8 (uint8x16_t r, uint8x16x4_t tab, uint8x16_t idx)
+ {
+- return (int16_t) __builtin_aarch64_sqrdmulhhi (__a, __b);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
++ return (uint8x16_t)__builtin_aarch64_qtbx4v16qi ((int8x16_t)r, __o,
++ (int8x16_t)idx);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrdmulhh_lane_s16 (int16_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vqtbx4q_p8 (poly8x16_t r, poly8x16x4_t tab, uint8x16_t idx)
+ {
+- return __builtin_aarch64_sqrdmulh_lanehi (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
++ return (poly8x16_t)__builtin_aarch64_qtbx4v16qi ((int8x16_t)r, __o,
++ (int8x16_t)idx);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrdmulhh_laneq_s16 (int16_t __a, int16x8_t __b, const int __c)
+-{
+- return __builtin_aarch64_sqrdmulh_laneqhi (__a, __b, __c);
+-}
++/* vrbit */
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrdmulhs_s32 (int32_t __a, int32_t __b)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrbit_p8 (poly8x8_t __a)
+ {
+- return (int32_t) __builtin_aarch64_sqrdmulhsi (__a, __b);
++ return (poly8x8_t) __builtin_aarch64_rbitv8qi ((int8x8_t) __a);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrdmulhs_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrbit_s8 (int8x8_t __a)
+ {
+- return __builtin_aarch64_sqrdmulh_lanesi (__a, __b, __c);
++ return __builtin_aarch64_rbitv8qi (__a);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrdmulhs_laneq_s32 (int32_t __a, int32x4_t __b, const int __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrbit_u8 (uint8x8_t __a)
+ {
+- return __builtin_aarch64_sqrdmulh_laneqsi (__a, __b, __c);
++ return (uint8x8_t) __builtin_aarch64_rbitv8qi ((int8x8_t) __a);
+ }
+
+-/* vqrshl */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqrshl_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrbitq_p8 (poly8x16_t __a)
+ {
+- return __builtin_aarch64_sqrshlv8qi (__a, __b);
++ return (poly8x16_t) __builtin_aarch64_rbitv16qi ((int8x16_t)__a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrshl_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrbitq_s8 (int8x16_t __a)
+ {
+- return __builtin_aarch64_sqrshlv4hi (__a, __b);
++ return __builtin_aarch64_rbitv16qi (__a);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrshl_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrbitq_u8 (uint8x16_t __a)
+ {
+- return __builtin_aarch64_sqrshlv2si (__a, __b);
++ return (uint8x16_t) __builtin_aarch64_rbitv16qi ((int8x16_t) __a);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vqrshl_s64 (int64x1_t __a, int64x1_t __b)
+-{
+- return (int64x1_t) {__builtin_aarch64_sqrshldi (__a[0], __b[0])};
+-}
++/* vrecpe */
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqrshl_u8 (uint8x8_t __a, int8x8_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpe_u32 (uint32x2_t __a)
+ {
+- return __builtin_aarch64_uqrshlv8qi_uus ( __a, __b);
++ return (uint32x2_t) __builtin_aarch64_urecpev2si ((int32x2_t) __a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vqrshl_u16 (uint16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpeq_u32 (uint32x4_t __a)
+ {
+- return __builtin_aarch64_uqrshlv4hi_uus ( __a, __b);
++ return (uint32x4_t) __builtin_aarch64_urecpev4si ((int32x4_t) __a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vqrshl_u32 (uint32x2_t __a, int32x2_t __b)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpes_f32 (float32_t __a)
+ {
+- return __builtin_aarch64_uqrshlv2si_uus ( __a, __b);
++ return __builtin_aarch64_frecpesf (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vqrshl_u64 (uint64x1_t __a, int64x1_t __b)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecped_f64 (float64_t __a)
+ {
+- return (uint64x1_t) {__builtin_aarch64_uqrshldi_uus (__a[0], __b[0])};
++ return __builtin_aarch64_frecpedf (__a);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqrshlq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpe_f32 (float32x2_t __a)
+ {
+- return __builtin_aarch64_sqrshlv16qi (__a, __b);
++ return __builtin_aarch64_frecpev2sf (__a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqrshlq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpe_f64 (float64x1_t __a)
+ {
+- return __builtin_aarch64_sqrshlv8hi (__a, __b);
++ return (float64x1_t) { vrecped_f64 (vget_lane_f64 (__a, 0)) };
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqrshlq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpeq_f32 (float32x4_t __a)
+ {
+- return __builtin_aarch64_sqrshlv4si (__a, __b);
++ return __builtin_aarch64_frecpev4sf (__a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqrshlq_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpeq_f64 (float64x2_t __a)
+ {
+- return __builtin_aarch64_sqrshlv2di (__a, __b);
++ return __builtin_aarch64_frecpev2df (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqrshlq_u8 (uint8x16_t __a, int8x16_t __b)
+-{
+- return __builtin_aarch64_uqrshlv16qi_uus ( __a, __b);
+-}
++/* vrecps */
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vqrshlq_u16 (uint16x8_t __a, int16x8_t __b)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpss_f32 (float32_t __a, float32_t __b)
+ {
+- return __builtin_aarch64_uqrshlv8hi_uus ( __a, __b);
++ return __builtin_aarch64_frecpssf (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vqrshlq_u32 (uint32x4_t __a, int32x4_t __b)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpsd_f64 (float64_t __a, float64_t __b)
+ {
+- return __builtin_aarch64_uqrshlv4si_uus ( __a, __b);
++ return __builtin_aarch64_frecpsdf (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vqrshlq_u64 (uint64x2_t __a, int64x2_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecps_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- return __builtin_aarch64_uqrshlv2di_uus ( __a, __b);
++ return __builtin_aarch64_frecpsv2sf (__a, __b);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqrshlb_s8 (int8_t __a, int8_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecps_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- return __builtin_aarch64_sqrshlqi (__a, __b);
++ return (float64x1_t) { vrecpsd_f64 (vget_lane_f64 (__a, 0),
++ vget_lane_f64 (__b, 0)) };
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrshlh_s16 (int16_t __a, int16_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpsq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- return __builtin_aarch64_sqrshlhi (__a, __b);
++ return __builtin_aarch64_frecpsv4sf (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrshls_s32 (int32_t __a, int32_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpsq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- return __builtin_aarch64_sqrshlsi (__a, __b);
++ return __builtin_aarch64_frecpsv2df (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqrshld_s64 (int64_t __a, int64_t __b)
++/* vrecpx */
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpxs_f32 (float32_t __a)
+ {
+- return __builtin_aarch64_sqrshldi (__a, __b);
++ return __builtin_aarch64_frecpxsf (__a);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vqrshlb_u8 (uint8_t __a, uint8_t __b)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpxd_f64 (float64_t __a)
+ {
+- return __builtin_aarch64_uqrshlqi_uus (__a, __b);
++ return __builtin_aarch64_frecpxdf (__a);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vqrshlh_u16 (uint16_t __a, uint16_t __b)
++
++/* vrev */
++
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev16_p8 (poly8x8_t a)
+ {
+- return __builtin_aarch64_uqrshlhi_uus (__a, __b);
++ return __builtin_shuffle (a, (uint8x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vqrshls_u32 (uint32_t __a, uint32_t __b)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev16_s8 (int8x8_t a)
+ {
+- return __builtin_aarch64_uqrshlsi_uus (__a, __b);
++ return __builtin_shuffle (a, (uint8x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vqrshld_u64 (uint64_t __a, uint64_t __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev16_u8 (uint8x8_t a)
+ {
+- return __builtin_aarch64_uqrshldi_uus (__a, __b);
++ return __builtin_shuffle (a, (uint8x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
+ }
+
+-/* vqrshrn */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqrshrn_n_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev16q_p8 (poly8x16_t a)
+ {
+- return (int8x8_t) __builtin_aarch64_sqrshrn_nv8hi (__a, __b);
++ return __builtin_shuffle (a,
++ (uint8x16_t) { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 });
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqrshrn_n_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev16q_s8 (int8x16_t a)
+ {
+- return (int16x4_t) __builtin_aarch64_sqrshrn_nv4si (__a, __b);
++ return __builtin_shuffle (a,
++ (uint8x16_t) { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 });
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqrshrn_n_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev16q_u8 (uint8x16_t a)
+ {
+- return (int32x2_t) __builtin_aarch64_sqrshrn_nv2di (__a, __b);
++ return __builtin_shuffle (a,
++ (uint8x16_t) { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 });
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqrshrn_n_u16 (uint16x8_t __a, const int __b)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32_p8 (poly8x8_t a)
+ {
+- return __builtin_aarch64_uqrshrn_nv8hi_uus ( __a, __b);
++ return __builtin_shuffle (a, (uint8x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vqrshrn_n_u32 (uint32x4_t __a, const int __b)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32_p16 (poly16x4_t a)
+ {
+- return __builtin_aarch64_uqrshrn_nv4si_uus ( __a, __b);
++ return __builtin_shuffle (a, (uint16x4_t) { 1, 0, 3, 2 });
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vqrshrn_n_u64 (uint64x2_t __a, const int __b)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32_s8 (int8x8_t a)
+ {
+- return __builtin_aarch64_uqrshrn_nv2di_uus ( __a, __b);
++ return __builtin_shuffle (a, (uint8x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqrshrnh_n_s16 (int16_t __a, const int __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32_s16 (int16x4_t a)
+ {
+- return (int8_t) __builtin_aarch64_sqrshrn_nhi (__a, __b);
++ return __builtin_shuffle (a, (uint16x4_t) { 1, 0, 3, 2 });
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrshrns_n_s32 (int32_t __a, const int __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32_u8 (uint8x8_t a)
+ {
+- return (int16_t) __builtin_aarch64_sqrshrn_nsi (__a, __b);
++ return __builtin_shuffle (a, (uint8x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrshrnd_n_s64 (int64_t __a, const int __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32_u16 (uint16x4_t a)
+ {
+- return (int32_t) __builtin_aarch64_sqrshrn_ndi (__a, __b);
++ return __builtin_shuffle (a, (uint16x4_t) { 1, 0, 3, 2 });
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vqrshrnh_n_u16 (uint16_t __a, const int __b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32q_p8 (poly8x16_t a)
+ {
+- return __builtin_aarch64_uqrshrn_nhi_uus (__a, __b);
++ return __builtin_shuffle (a,
++ (uint8x16_t) { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 });
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vqrshrns_n_u32 (uint32_t __a, const int __b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32q_p16 (poly16x8_t a)
+ {
+- return __builtin_aarch64_uqrshrn_nsi_uus (__a, __b);
++ return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vqrshrnd_n_u64 (uint64_t __a, const int __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32q_s8 (int8x16_t a)
+ {
+- return __builtin_aarch64_uqrshrn_ndi_uus (__a, __b);
++ return __builtin_shuffle (a,
++ (uint8x16_t) { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 });
+ }
+
+-/* vqrshrun */
+-
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqrshrun_n_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32q_s16 (int16x8_t a)
+ {
+- return (uint8x8_t) __builtin_aarch64_sqrshrun_nv8hi (__a, __b);
++ return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vqrshrun_n_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32q_u8 (uint8x16_t a)
+ {
+- return (uint16x4_t) __builtin_aarch64_sqrshrun_nv4si (__a, __b);
++ return __builtin_shuffle (a,
++ (uint8x16_t) { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 });
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vqrshrun_n_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev32q_u16 (uint16x8_t a)
+ {
+- return (uint32x2_t) __builtin_aarch64_sqrshrun_nv2di (__a, __b);
++ return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqrshrunh_n_s16 (int16_t __a, const int __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64_f16 (float16x4_t __a)
+ {
+- return (int8_t) __builtin_aarch64_sqrshrun_nhi (__a, __b);
++ return __builtin_shuffle (__a, (uint16x4_t) { 3, 2, 1, 0 });
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqrshruns_n_s32 (int32_t __a, const int __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64_f32 (float32x2_t a)
+ {
+- return (int16_t) __builtin_aarch64_sqrshrun_nsi (__a, __b);
++ return __builtin_shuffle (a, (uint32x2_t) { 1, 0 });
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqrshrund_n_s64 (int64_t __a, const int __b)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64_p8 (poly8x8_t a)
+ {
+- return (int32_t) __builtin_aarch64_sqrshrun_ndi (__a, __b);
++ return __builtin_shuffle (a, (uint8x8_t) { 7, 6, 5, 4, 3, 2, 1, 0 });
+ }
+
+-/* vqshl */
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64_p16 (poly16x4_t a)
++{
++ return __builtin_shuffle (a, (uint16x4_t) { 3, 2, 1, 0 });
++}
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqshl_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64_s8 (int8x8_t a)
+ {
+- return __builtin_aarch64_sqshlv8qi (__a, __b);
++ return __builtin_shuffle (a, (uint8x8_t) { 7, 6, 5, 4, 3, 2, 1, 0 });
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqshl_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64_s16 (int16x4_t a)
+ {
+- return __builtin_aarch64_sqshlv4hi (__a, __b);
++ return __builtin_shuffle (a, (uint16x4_t) { 3, 2, 1, 0 });
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqshl_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64_s32 (int32x2_t a)
+ {
+- return __builtin_aarch64_sqshlv2si (__a, __b);
++ return __builtin_shuffle (a, (uint32x2_t) { 1, 0 });
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vqshl_s64 (int64x1_t __a, int64x1_t __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64_u8 (uint8x8_t a)
+ {
+- return (int64x1_t) {__builtin_aarch64_sqshldi (__a[0], __b[0])};
++ return __builtin_shuffle (a, (uint8x8_t) { 7, 6, 5, 4, 3, 2, 1, 0 });
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqshl_u8 (uint8x8_t __a, int8x8_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64_u16 (uint16x4_t a)
+ {
+- return __builtin_aarch64_uqshlv8qi_uus ( __a, __b);
++ return __builtin_shuffle (a, (uint16x4_t) { 3, 2, 1, 0 });
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vqshl_u16 (uint16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64_u32 (uint32x2_t a)
+ {
+- return __builtin_aarch64_uqshlv4hi_uus ( __a, __b);
++ return __builtin_shuffle (a, (uint32x2_t) { 1, 0 });
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vqshl_u32 (uint32x2_t __a, int32x2_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64q_f16 (float16x8_t __a)
+ {
+- return __builtin_aarch64_uqshlv2si_uus ( __a, __b);
++ return __builtin_shuffle (__a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vqshl_u64 (uint64x1_t __a, int64x1_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64q_f32 (float32x4_t a)
+ {
+- return (uint64x1_t) {__builtin_aarch64_uqshldi_uus (__a[0], __b[0])};
++ return __builtin_shuffle (a, (uint32x4_t) { 1, 0, 3, 2 });
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqshlq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64q_p8 (poly8x16_t a)
+ {
+- return __builtin_aarch64_sqshlv16qi (__a, __b);
++ return __builtin_shuffle (a,
++ (uint8x16_t) { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 });
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqshlq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64q_p16 (poly16x8_t a)
+ {
+- return __builtin_aarch64_sqshlv8hi (__a, __b);
++ return __builtin_shuffle (a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqshlq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64q_s8 (int8x16_t a)
+ {
+- return __builtin_aarch64_sqshlv4si (__a, __b);
++ return __builtin_shuffle (a,
++ (uint8x16_t) { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 });
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqshlq_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64q_s16 (int16x8_t a)
+ {
+- return __builtin_aarch64_sqshlv2di (__a, __b);
++ return __builtin_shuffle (a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqshlq_u8 (uint8x16_t __a, int8x16_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64q_s32 (int32x4_t a)
+ {
+- return __builtin_aarch64_uqshlv16qi_uus ( __a, __b);
++ return __builtin_shuffle (a, (uint32x4_t) { 1, 0, 3, 2 });
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vqshlq_u16 (uint16x8_t __a, int16x8_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64q_u8 (uint8x16_t a)
+ {
+- return __builtin_aarch64_uqshlv8hi_uus ( __a, __b);
++ return __builtin_shuffle (a,
++ (uint8x16_t) { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 });
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vqshlq_u32 (uint32x4_t __a, int32x4_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64q_u16 (uint16x8_t a)
+ {
+- return __builtin_aarch64_uqshlv4si_uus ( __a, __b);
++ return __builtin_shuffle (a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vqshlq_u64 (uint64x2_t __a, int64x2_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrev64q_u32 (uint32x4_t a)
+ {
+- return __builtin_aarch64_uqshlv2di_uus ( __a, __b);
++ return __builtin_shuffle (a, (uint32x4_t) { 1, 0, 3, 2 });
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqshlb_s8 (int8_t __a, int8_t __b)
++/* vrnd */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrnd_f32 (float32x2_t __a)
+ {
+- return __builtin_aarch64_sqshlqi (__a, __b);
++ return __builtin_aarch64_btruncv2sf (__a);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqshlh_s16 (int16_t __a, int16_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrnd_f64 (float64x1_t __a)
+ {
+- return __builtin_aarch64_sqshlhi (__a, __b);
++ return vset_lane_f64 (__builtin_trunc (vget_lane_f64 (__a, 0)), __a, 0);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqshls_s32 (int32_t __a, int32_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndq_f32 (float32x4_t __a)
+ {
+- return __builtin_aarch64_sqshlsi (__a, __b);
++ return __builtin_aarch64_btruncv4sf (__a);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqshld_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndq_f64 (float64x2_t __a)
+ {
+- return __builtin_aarch64_sqshldi (__a, __b);
++ return __builtin_aarch64_btruncv2df (__a);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vqshlb_u8 (uint8_t __a, uint8_t __b)
++/* vrnda */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrnda_f32 (float32x2_t __a)
+ {
+- return __builtin_aarch64_uqshlqi_uus (__a, __b);
++ return __builtin_aarch64_roundv2sf (__a);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vqshlh_u16 (uint16_t __a, uint16_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrnda_f64 (float64x1_t __a)
+ {
+- return __builtin_aarch64_uqshlhi_uus (__a, __b);
++ return vset_lane_f64 (__builtin_round (vget_lane_f64 (__a, 0)), __a, 0);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vqshls_u32 (uint32_t __a, uint32_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndaq_f32 (float32x4_t __a)
+ {
+- return __builtin_aarch64_uqshlsi_uus (__a, __b);
++ return __builtin_aarch64_roundv4sf (__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vqshld_u64 (uint64_t __a, uint64_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndaq_f64 (float64x2_t __a)
+ {
+- return __builtin_aarch64_uqshldi_uus (__a, __b);
++ return __builtin_aarch64_roundv2df (__a);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqshl_n_s8 (int8x8_t __a, const int __b)
++/* vrndi */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndi_f32 (float32x2_t __a)
+ {
+- return (int8x8_t) __builtin_aarch64_sqshl_nv8qi (__a, __b);
++ return __builtin_aarch64_nearbyintv2sf (__a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqshl_n_s16 (int16x4_t __a, const int __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndi_f64 (float64x1_t __a)
+ {
+- return (int16x4_t) __builtin_aarch64_sqshl_nv4hi (__a, __b);
++ return vset_lane_f64 (__builtin_nearbyint (vget_lane_f64 (__a, 0)), __a, 0);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqshl_n_s32 (int32x2_t __a, const int __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndiq_f32 (float32x4_t __a)
+ {
+- return (int32x2_t) __builtin_aarch64_sqshl_nv2si (__a, __b);
++ return __builtin_aarch64_nearbyintv4sf (__a);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vqshl_n_s64 (int64x1_t __a, const int __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndiq_f64 (float64x2_t __a)
+ {
+- return (int64x1_t) {__builtin_aarch64_sqshl_ndi (__a[0], __b)};
++ return __builtin_aarch64_nearbyintv2df (__a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqshl_n_u8 (uint8x8_t __a, const int __b)
++/* vrndm */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndm_f32 (float32x2_t __a)
+ {
+- return __builtin_aarch64_uqshl_nv8qi_uus (__a, __b);
++ return __builtin_aarch64_floorv2sf (__a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vqshl_n_u16 (uint16x4_t __a, const int __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndm_f64 (float64x1_t __a)
+ {
+- return __builtin_aarch64_uqshl_nv4hi_uus (__a, __b);
++ return vset_lane_f64 (__builtin_floor (vget_lane_f64 (__a, 0)), __a, 0);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vqshl_n_u32 (uint32x2_t __a, const int __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndmq_f32 (float32x4_t __a)
+ {
+- return __builtin_aarch64_uqshl_nv2si_uus (__a, __b);
++ return __builtin_aarch64_floorv4sf (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vqshl_n_u64 (uint64x1_t __a, const int __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndmq_f64 (float64x2_t __a)
+ {
+- return (uint64x1_t) {__builtin_aarch64_uqshl_ndi_uus (__a[0], __b)};
++ return __builtin_aarch64_floorv2df (__a);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqshlq_n_s8 (int8x16_t __a, const int __b)
++/* vrndn */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndn_f32 (float32x2_t __a)
+ {
+- return (int8x16_t) __builtin_aarch64_sqshl_nv16qi (__a, __b);
++ return __builtin_aarch64_frintnv2sf (__a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vqshlq_n_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndn_f64 (float64x1_t __a)
+ {
+- return (int16x8_t) __builtin_aarch64_sqshl_nv8hi (__a, __b);
++ return (float64x1_t) {__builtin_aarch64_frintndf (__a[0])};
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vqshlq_n_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndnq_f32 (float32x4_t __a)
+ {
+- return (int32x4_t) __builtin_aarch64_sqshl_nv4si (__a, __b);
++ return __builtin_aarch64_frintnv4sf (__a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vqshlq_n_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndnq_f64 (float64x2_t __a)
+ {
+- return (int64x2_t) __builtin_aarch64_sqshl_nv2di (__a, __b);
++ return __builtin_aarch64_frintnv2df (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqshlq_n_u8 (uint8x16_t __a, const int __b)
++/* vrndp */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndp_f32 (float32x2_t __a)
+ {
+- return __builtin_aarch64_uqshl_nv16qi_uus (__a, __b);
++ return __builtin_aarch64_ceilv2sf (__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vqshlq_n_u16 (uint16x8_t __a, const int __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndp_f64 (float64x1_t __a)
+ {
+- return __builtin_aarch64_uqshl_nv8hi_uus (__a, __b);
++ return vset_lane_f64 (__builtin_ceil (vget_lane_f64 (__a, 0)), __a, 0);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vqshlq_n_u32 (uint32x4_t __a, const int __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndpq_f32 (float32x4_t __a)
+ {
+- return __builtin_aarch64_uqshl_nv4si_uus (__a, __b);
++ return __builtin_aarch64_ceilv4sf (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vqshlq_n_u64 (uint64x2_t __a, const int __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndpq_f64 (float64x2_t __a)
+ {
+- return __builtin_aarch64_uqshl_nv2di_uus (__a, __b);
++ return __builtin_aarch64_ceilv2df (__a);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqshlb_n_s8 (int8_t __a, const int __b)
++/* vrndx */
++
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndx_f32 (float32x2_t __a)
+ {
+- return (int8_t) __builtin_aarch64_sqshl_nqi (__a, __b);
++ return __builtin_aarch64_rintv2sf (__a);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqshlh_n_s16 (int16_t __a, const int __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndx_f64 (float64x1_t __a)
+ {
+- return (int16_t) __builtin_aarch64_sqshl_nhi (__a, __b);
++ return vset_lane_f64 (__builtin_rint (vget_lane_f64 (__a, 0)), __a, 0);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqshls_n_s32 (int32_t __a, const int __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndxq_f32 (float32x4_t __a)
+ {
+- return (int32_t) __builtin_aarch64_sqshl_nsi (__a, __b);
++ return __builtin_aarch64_rintv4sf (__a);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqshld_n_s64 (int64_t __a, const int __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndxq_f64 (float64x2_t __a)
+ {
+- return __builtin_aarch64_sqshl_ndi (__a, __b);
++ return __builtin_aarch64_rintv2df (__a);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vqshlb_n_u8 (uint8_t __a, const int __b)
++/* vrshl */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshl_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- return __builtin_aarch64_uqshl_nqi_uus (__a, __b);
++ return (int8x8_t) __builtin_aarch64_srshlv8qi (__a, __b);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vqshlh_n_u16 (uint16_t __a, const int __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshl_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- return __builtin_aarch64_uqshl_nhi_uus (__a, __b);
++ return (int16x4_t) __builtin_aarch64_srshlv4hi (__a, __b);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vqshls_n_u32 (uint32_t __a, const int __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshl_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- return __builtin_aarch64_uqshl_nsi_uus (__a, __b);
++ return (int32x2_t) __builtin_aarch64_srshlv2si (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vqshld_n_u64 (uint64_t __a, const int __b)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshl_s64 (int64x1_t __a, int64x1_t __b)
+ {
+- return __builtin_aarch64_uqshl_ndi_uus (__a, __b);
++ return (int64x1_t) {__builtin_aarch64_srshldi (__a[0], __b[0])};
+ }
+
+-/* vqshlu */
+-
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqshlu_n_s8 (int8x8_t __a, const int __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshl_u8 (uint8x8_t __a, int8x8_t __b)
+ {
+- return __builtin_aarch64_sqshlu_nv8qi_uss (__a, __b);
++ return __builtin_aarch64_urshlv8qi_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vqshlu_n_s16 (int16x4_t __a, const int __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshl_u16 (uint16x4_t __a, int16x4_t __b)
+ {
+- return __builtin_aarch64_sqshlu_nv4hi_uss (__a, __b);
++ return __builtin_aarch64_urshlv4hi_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vqshlu_n_s32 (int32x2_t __a, const int __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshl_u32 (uint32x2_t __a, int32x2_t __b)
+ {
+- return __builtin_aarch64_sqshlu_nv2si_uss (__a, __b);
++ return __builtin_aarch64_urshlv2si_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vqshlu_n_s64 (int64x1_t __a, const int __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshl_u64 (uint64x1_t __a, int64x1_t __b)
+ {
+- return (uint64x1_t) {__builtin_aarch64_sqshlu_ndi_uss (__a[0], __b)};
++ return (uint64x1_t) {__builtin_aarch64_urshldi_uus (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqshluq_n_s8 (int8x16_t __a, const int __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshlq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- return __builtin_aarch64_sqshlu_nv16qi_uss (__a, __b);
++ return (int8x16_t) __builtin_aarch64_srshlv16qi (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vqshluq_n_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshlq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- return __builtin_aarch64_sqshlu_nv8hi_uss (__a, __b);
++ return (int16x8_t) __builtin_aarch64_srshlv8hi (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vqshluq_n_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshlq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- return __builtin_aarch64_sqshlu_nv4si_uss (__a, __b);
++ return (int32x4_t) __builtin_aarch64_srshlv4si (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vqshluq_n_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshlq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- return __builtin_aarch64_sqshlu_nv2di_uss (__a, __b);
++ return (int64x2_t) __builtin_aarch64_srshlv2di (__a, __b);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqshlub_n_s8 (int8_t __a, const int __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshlq_u8 (uint8x16_t __a, int8x16_t __b)
+ {
+- return (int8_t) __builtin_aarch64_sqshlu_nqi_uss (__a, __b);
++ return __builtin_aarch64_urshlv16qi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqshluh_n_s16 (int16_t __a, const int __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshlq_u16 (uint16x8_t __a, int16x8_t __b)
+ {
+- return (int16_t) __builtin_aarch64_sqshlu_nhi_uss (__a, __b);
++ return __builtin_aarch64_urshlv8hi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqshlus_n_s32 (int32_t __a, const int __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshlq_u32 (uint32x4_t __a, int32x4_t __b)
+ {
+- return (int32_t) __builtin_aarch64_sqshlu_nsi_uss (__a, __b);
++ return __builtin_aarch64_urshlv4si_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vqshlud_n_s64 (int64_t __a, const int __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshlq_u64 (uint64x2_t __a, int64x2_t __b)
+ {
+- return __builtin_aarch64_sqshlu_ndi_uss (__a, __b);
++ return __builtin_aarch64_urshlv2di_uus (__a, __b);
+ }
+
+-/* vqshrn */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqshrn_n_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshld_s64 (int64_t __a, int64_t __b)
+ {
+- return (int8x8_t) __builtin_aarch64_sqshrn_nv8hi (__a, __b);
++ return __builtin_aarch64_srshldi (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vqshrn_n_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshld_u64 (uint64_t __a, int64_t __b)
+ {
+- return (int16x4_t) __builtin_aarch64_sqshrn_nv4si (__a, __b);
++ return __builtin_aarch64_urshldi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vqshrn_n_s64 (int64x2_t __a, const int __b)
++/* vrshr */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshr_n_s8 (int8x8_t __a, const int __b)
+ {
+- return (int32x2_t) __builtin_aarch64_sqshrn_nv2di (__a, __b);
++ return (int8x8_t) __builtin_aarch64_srshr_nv8qi (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqshrn_n_u16 (uint16x8_t __a, const int __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshr_n_s16 (int16x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_uqshrn_nv8hi_uus ( __a, __b);
++ return (int16x4_t) __builtin_aarch64_srshr_nv4hi (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vqshrn_n_u32 (uint32x4_t __a, const int __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshr_n_s32 (int32x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_uqshrn_nv4si_uus ( __a, __b);
++ return (int32x2_t) __builtin_aarch64_srshr_nv2si (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vqshrn_n_u64 (uint64x2_t __a, const int __b)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshr_n_s64 (int64x1_t __a, const int __b)
+ {
+- return __builtin_aarch64_uqshrn_nv2di_uus ( __a, __b);
++ return (int64x1_t) {__builtin_aarch64_srshr_ndi (__a[0], __b)};
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqshrnh_n_s16 (int16_t __a, const int __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshr_n_u8 (uint8x8_t __a, const int __b)
+ {
+- return (int8_t) __builtin_aarch64_sqshrn_nhi (__a, __b);
++ return __builtin_aarch64_urshr_nv8qi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqshrns_n_s32 (int32_t __a, const int __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshr_n_u16 (uint16x4_t __a, const int __b)
+ {
+- return (int16_t) __builtin_aarch64_sqshrn_nsi (__a, __b);
++ return __builtin_aarch64_urshr_nv4hi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqshrnd_n_s64 (int64_t __a, const int __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshr_n_u32 (uint32x2_t __a, const int __b)
+ {
+- return (int32_t) __builtin_aarch64_sqshrn_ndi (__a, __b);
++ return __builtin_aarch64_urshr_nv2si_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vqshrnh_n_u16 (uint16_t __a, const int __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshr_n_u64 (uint64x1_t __a, const int __b)
+ {
+- return __builtin_aarch64_uqshrn_nhi_uus (__a, __b);
++ return (uint64x1_t) {__builtin_aarch64_urshr_ndi_uus (__a[0], __b)};
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vqshrns_n_u32 (uint32_t __a, const int __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshrq_n_s8 (int8x16_t __a, const int __b)
+ {
+- return __builtin_aarch64_uqshrn_nsi_uus (__a, __b);
++ return (int8x16_t) __builtin_aarch64_srshr_nv16qi (__a, __b);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vqshrnd_n_u64 (uint64_t __a, const int __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshrq_n_s16 (int16x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_uqshrn_ndi_uus (__a, __b);
++ return (int16x8_t) __builtin_aarch64_srshr_nv8hi (__a, __b);
+ }
+
+-/* vqshrun */
+-
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqshrun_n_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshrq_n_s32 (int32x4_t __a, const int __b)
+ {
+- return (uint8x8_t) __builtin_aarch64_sqshrun_nv8hi (__a, __b);
++ return (int32x4_t) __builtin_aarch64_srshr_nv4si (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vqshrun_n_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshrq_n_s64 (int64x2_t __a, const int __b)
+ {
+- return (uint16x4_t) __builtin_aarch64_sqshrun_nv4si (__a, __b);
++ return (int64x2_t) __builtin_aarch64_srshr_nv2di (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vqshrun_n_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshrq_n_u8 (uint8x16_t __a, const int __b)
+ {
+- return (uint32x2_t) __builtin_aarch64_sqshrun_nv2di (__a, __b);
++ return __builtin_aarch64_urshr_nv16qi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqshrunh_n_s16 (int16_t __a, const int __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshrq_n_u16 (uint16x8_t __a, const int __b)
+ {
+- return (int8_t) __builtin_aarch64_sqshrun_nhi (__a, __b);
++ return __builtin_aarch64_urshr_nv8hi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqshruns_n_s32 (int32_t __a, const int __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshrq_n_u32 (uint32x4_t __a, const int __b)
+ {
+- return (int16_t) __builtin_aarch64_sqshrun_nsi (__a, __b);
++ return __builtin_aarch64_urshr_nv4si_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqshrund_n_s64 (int64_t __a, const int __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshrq_n_u64 (uint64x2_t __a, const int __b)
+ {
+- return (int32_t) __builtin_aarch64_sqshrun_ndi (__a, __b);
++ return __builtin_aarch64_urshr_nv2di_uus (__a, __b);
+ }
+
+-/* vqsub */
+-
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vqsubb_s8 (int8_t __a, int8_t __b)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshrd_n_s64 (int64_t __a, const int __b)
+ {
+- return (int8_t) __builtin_aarch64_sqsubqi (__a, __b);
++ return __builtin_aarch64_srshr_ndi (__a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vqsubh_s16 (int16_t __a, int16_t __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrshrd_n_u64 (uint64_t __a, const int __b)
+ {
+- return (int16_t) __builtin_aarch64_sqsubhi (__a, __b);
++ return __builtin_aarch64_urshr_ndi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vqsubs_s32 (int32_t __a, int32_t __b)
++/* vrsqrte. */
++
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrtes_f32 (float32_t __a)
+ {
+- return (int32_t) __builtin_aarch64_sqsubsi (__a, __b);
++ return __builtin_aarch64_rsqrtesf (__a);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vqsubd_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrted_f64 (float64_t __a)
+ {
+- return __builtin_aarch64_sqsubdi (__a, __b);
++ return __builtin_aarch64_rsqrtedf (__a);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vqsubb_u8 (uint8_t __a, uint8_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrte_f32 (float32x2_t __a)
+ {
+- return (uint8_t) __builtin_aarch64_uqsubqi_uuu (__a, __b);
++ return __builtin_aarch64_rsqrtev2sf (__a);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vqsubh_u16 (uint16_t __a, uint16_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrte_f64 (float64x1_t __a)
+ {
+- return (uint16_t) __builtin_aarch64_uqsubhi_uuu (__a, __b);
++ return (float64x1_t) {vrsqrted_f64 (vget_lane_f64 (__a, 0))};
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vqsubs_u32 (uint32_t __a, uint32_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrteq_f32 (float32x4_t __a)
+ {
+- return (uint32_t) __builtin_aarch64_uqsubsi_uuu (__a, __b);
++ return __builtin_aarch64_rsqrtev4sf (__a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vqsubd_u64 (uint64_t __a, uint64_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrteq_f64 (float64x2_t __a)
+ {
+- return __builtin_aarch64_uqsubdi_uuu (__a, __b);
++ return __builtin_aarch64_rsqrtev2df (__a);
+ }
+
+-/* vqtbl2 */
++/* vrsqrts. */
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqtbl2_s8 (int8x16x2_t tab, uint8x8_t idx)
++__extension__ extern __inline float32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrtss_f32 (float32_t __a, float32_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[1], 1);
+- return __builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
++ return __builtin_aarch64_rsqrtssf (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqtbl2_u8 (uint8x16x2_t tab, uint8x8_t idx)
++__extension__ extern __inline float64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrtsd_f64 (float64_t __a, float64_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- return (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
++ return __builtin_aarch64_rsqrtsdf (__a, __b);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vqtbl2_p8 (poly8x16x2_t tab, uint8x8_t idx)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrts_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- return (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
++ return __builtin_aarch64_rsqrtsv2sf (__a, __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqtbl2q_s8 (int8x16x2_t tab, uint8x16_t idx)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrts_f64 (float64x1_t __a, float64x1_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- return __builtin_aarch64_tbl3v16qi (__o, (int8x16_t)idx);
++ return (float64x1_t) {vrsqrtsd_f64 (vget_lane_f64 (__a, 0),
++ vget_lane_f64 (__b, 0))};
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqtbl2q_u8 (uint8x16x2_t tab, uint8x16_t idx)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrtsq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- return (uint8x16_t)__builtin_aarch64_tbl3v16qi (__o, (int8x16_t)idx);
++ return __builtin_aarch64_rsqrtsv4sf (__a, __b);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vqtbl2q_p8 (poly8x16x2_t tab, uint8x16_t idx)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrtsq_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- return (poly8x16_t)__builtin_aarch64_tbl3v16qi (__o, (int8x16_t)idx);
++ return __builtin_aarch64_rsqrtsv2df (__a, __b);
+ }
+
+-/* vqtbl3 */
++/* vrsra */
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqtbl3_s8 (int8x16x3_t tab, uint8x8_t idx)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsra_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+- return __builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)idx);
++ return (int8x8_t) __builtin_aarch64_srsra_nv8qi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqtbl3_u8 (uint8x16x3_t tab, uint8x8_t idx)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsra_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+- return (uint8x8_t)__builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)idx);
++ return (int16x4_t) __builtin_aarch64_srsra_nv4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vqtbl3_p8 (poly8x16x3_t tab, uint8x8_t idx)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsra_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+- return (poly8x8_t)__builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)idx);
++ return (int32x2_t) __builtin_aarch64_srsra_nv2si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqtbl3q_s8 (int8x16x3_t tab, uint8x16_t idx)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsra_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+- return __builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)idx);
++ return (int64x1_t) {__builtin_aarch64_srsra_ndi (__a[0], __b[0], __c)};
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqtbl3q_u8 (uint8x16x3_t tab, uint8x16_t idx)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsra_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+- return (uint8x16_t)__builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)idx);
++ return __builtin_aarch64_ursra_nv8qi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vqtbl3q_p8 (poly8x16x3_t tab, uint8x16_t idx)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsra_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+- return (poly8x16_t)__builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)idx);
++ return __builtin_aarch64_ursra_nv4hi_uuus (__a, __b, __c);
+ }
+
+-/* vqtbl4 */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqtbl4_s8 (int8x16x4_t tab, uint8x8_t idx)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsra_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+- return __builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)idx);
++ return __builtin_aarch64_ursra_nv2si_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqtbl4_u8 (uint8x16x4_t tab, uint8x8_t idx)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsra_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+- return (uint8x8_t)__builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)idx);
++ return (uint64x1_t) {__builtin_aarch64_ursra_ndi_uuus (__a[0], __b[0], __c)};
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vqtbl4_p8 (poly8x16x4_t tab, uint8x8_t idx)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsraq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+- return (poly8x8_t)__builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)idx);
++ return (int8x16_t) __builtin_aarch64_srsra_nv16qi (__a, __b, __c);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqtbl4q_s8 (int8x16x4_t tab, uint8x16_t idx)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsraq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+- return __builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)idx);
++ return (int16x8_t) __builtin_aarch64_srsra_nv8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqtbl4q_u8 (uint8x16x4_t tab, uint8x16_t idx)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsraq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+- return (uint8x16_t)__builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)idx);
++ return (int32x4_t) __builtin_aarch64_srsra_nv4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vqtbl4q_p8 (poly8x16x4_t tab, uint8x16_t idx)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsraq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+- return (poly8x16_t)__builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)idx);
++ return (int64x2_t) __builtin_aarch64_srsra_nv2di (__a, __b, __c);
+ }
+
+-
+-/* vqtbx2 */
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqtbx2_s8 (int8x8_t r, int8x16x2_t tab, uint8x8_t idx)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsraq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[1], 1);
+- return __builtin_aarch64_tbx4v8qi (r, __o, (int8x8_t)idx);
++ return __builtin_aarch64_ursra_nv16qi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqtbx2_u8 (uint8x8_t r, uint8x16x2_t tab, uint8x8_t idx)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsraq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- return (uint8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)r, __o,
+- (int8x8_t)idx);
++ return __builtin_aarch64_ursra_nv8hi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vqtbx2_p8 (poly8x8_t r, poly8x16x2_t tab, uint8x8_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsraq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- return (poly8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)r, __o,
+- (int8x8_t)idx);
++ return __builtin_aarch64_ursra_nv4si_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqtbx2q_s8 (int8x16_t r, int8x16x2_t tab, uint8x16_t idx)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsraq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[1], 1);
+- return __builtin_aarch64_tbx4v16qi (r, __o, (int8x16_t)idx);
++ return __builtin_aarch64_ursra_nv2di_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqtbx2q_u8 (uint8x16_t r, uint8x16x2_t tab, uint8x16_t idx)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsrad_n_s64 (int64_t __a, int64_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- return (uint8x16_t)__builtin_aarch64_tbx4v16qi ((int8x16_t)r, __o,
+- (int8x16_t)idx);
++ return __builtin_aarch64_srsra_ndi (__a, __b, __c);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vqtbx2q_p8 (poly8x16_t r, poly8x16x2_t tab, uint8x16_t idx)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsrad_n_u64 (uint64_t __a, uint64_t __b, const int __c)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- return (poly8x16_t)__builtin_aarch64_tbx4v16qi ((int8x16_t)r, __o,
+- (int8x16_t)idx);
++ return __builtin_aarch64_ursra_ndi_uuus (__a, __b, __c);
+ }
+
+-/* vqtbx3 */
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqtbx3_s8 (int8x8_t r, int8x16x3_t tab, uint8x8_t idx)
++#pragma GCC push_options
++#pragma GCC target ("+nothing+crypto")
++
++/* vsha1 */
++
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsha1cq_u32 (uint32x4_t hash_abcd, uint32_t hash_e, uint32x4_t wk)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[2], 2);
+- return __builtin_aarch64_qtbx3v8qi (r, __o, (int8x8_t)idx);
++ return __builtin_aarch64_crypto_sha1cv4si_uuuu (hash_abcd, hash_e, wk);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqtbx3_u8 (uint8x8_t r, uint8x16x3_t tab, uint8x8_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsha1mq_u32 (uint32x4_t hash_abcd, uint32_t hash_e, uint32x4_t wk)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+- return (uint8x8_t)__builtin_aarch64_qtbx3v8qi ((int8x8_t)r, __o,
+- (int8x8_t)idx);
++ return __builtin_aarch64_crypto_sha1mv4si_uuuu (hash_abcd, hash_e, wk);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vqtbx3_p8 (poly8x8_t r, poly8x16x3_t tab, uint8x8_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsha1pq_u32 (uint32x4_t hash_abcd, uint32_t hash_e, uint32x4_t wk)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+- return (poly8x8_t)__builtin_aarch64_qtbx3v8qi ((int8x8_t)r, __o,
+- (int8x8_t)idx);
++ return __builtin_aarch64_crypto_sha1pv4si_uuuu (hash_abcd, hash_e, wk);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqtbx3q_s8 (int8x16_t r, int8x16x3_t tab, uint8x16_t idx)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsha1h_u32 (uint32_t hash_e)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[2], 2);
+- return __builtin_aarch64_qtbx3v16qi (r, __o, (int8x16_t)idx);
++ return __builtin_aarch64_crypto_sha1hsi_uu (hash_e);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqtbx3q_u8 (uint8x16_t r, uint8x16x3_t tab, uint8x16_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsha1su0q_u32 (uint32x4_t w0_3, uint32x4_t w4_7, uint32x4_t w8_11)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+- return (uint8x16_t)__builtin_aarch64_qtbx3v16qi ((int8x16_t)r, __o,
+- (int8x16_t)idx);
++ return __builtin_aarch64_crypto_sha1su0v4si_uuuu (w0_3, w4_7, w8_11);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vqtbx3q_p8 (poly8x16_t r, poly8x16x3_t tab, uint8x16_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsha1su1q_u32 (uint32x4_t tw0_3, uint32x4_t w12_15)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+- return (poly8x16_t)__builtin_aarch64_qtbx3v16qi ((int8x16_t)r, __o,
+- (int8x16_t)idx);
++ return __builtin_aarch64_crypto_sha1su1v4si_uuu (tw0_3, w12_15);
+ }
+
+-/* vqtbx4 */
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsha256hq_u32 (uint32x4_t hash_abcd, uint32x4_t hash_efgh, uint32x4_t wk)
++{
++ return __builtin_aarch64_crypto_sha256hv4si_uuuu (hash_abcd, hash_efgh, wk);
++}
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vqtbx4_s8 (int8x8_t r, int8x16x4_t tab, uint8x8_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsha256h2q_u32 (uint32x4_t hash_efgh, uint32x4_t hash_abcd, uint32x4_t wk)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[3], 3);
+- return __builtin_aarch64_qtbx4v8qi (r, __o, (int8x8_t)idx);
++ return __builtin_aarch64_crypto_sha256h2v4si_uuuu (hash_efgh, hash_abcd, wk);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vqtbx4_u8 (uint8x8_t r, uint8x16x4_t tab, uint8x8_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsha256su0q_u32 (uint32x4_t w0_3, uint32x4_t w4_7)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+- return (uint8x8_t)__builtin_aarch64_qtbx4v8qi ((int8x8_t)r, __o,
+- (int8x8_t)idx);
++ return __builtin_aarch64_crypto_sha256su0v4si_uuu (w0_3, w4_7);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vqtbx4_p8 (poly8x8_t r, poly8x16x4_t tab, uint8x8_t idx)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsha256su1q_u32 (uint32x4_t tw0_3, uint32x4_t w8_11, uint32x4_t w12_15)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+- return (poly8x8_t)__builtin_aarch64_qtbx4v8qi ((int8x8_t)r, __o,
+- (int8x8_t)idx);
++ return __builtin_aarch64_crypto_sha256su1v4si_uuuu (tw0_3, w8_11, w12_15);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vqtbx4q_s8 (int8x16_t r, int8x16x4_t tab, uint8x16_t idx)
++__extension__ extern __inline poly128_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_p64 (poly64_t a, poly64_t b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[3], 3);
+- return __builtin_aarch64_qtbx4v16qi (r, __o, (int8x16_t)idx);
++ return
++ __builtin_aarch64_crypto_pmulldi_ppp (a, b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vqtbx4q_u8 (uint8x16_t r, uint8x16x4_t tab, uint8x16_t idx)
++__extension__ extern __inline poly128_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmull_high_p64 (poly64x2_t a, poly64x2_t b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+- return (uint8x16_t)__builtin_aarch64_qtbx4v16qi ((int8x16_t)r, __o,
+- (int8x16_t)idx);
++ return __builtin_aarch64_crypto_pmullv2di_ppp (a, b);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vqtbx4q_p8 (poly8x16_t r, poly8x16x4_t tab, uint8x16_t idx)
++#pragma GCC pop_options
++
++/* vshl */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_n_s8 (int8x8_t __a, const int __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+- return (poly8x16_t)__builtin_aarch64_qtbx4v16qi ((int8x16_t)r, __o,
+- (int8x16_t)idx);
++ return (int8x8_t) __builtin_aarch64_ashlv8qi (__a, __b);
+ }
+
+-/* vrbit */
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_n_s16 (int16x4_t __a, const int __b)
++{
++ return (int16x4_t) __builtin_aarch64_ashlv4hi (__a, __b);
++}
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vrbit_p8 (poly8x8_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_n_s32 (int32x2_t __a, const int __b)
+ {
+- return (poly8x8_t) __builtin_aarch64_rbitv8qi ((int8x8_t) __a);
++ return (int32x2_t) __builtin_aarch64_ashlv2si (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vrbit_s8 (int8x8_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_n_s64 (int64x1_t __a, const int __b)
+ {
+- return __builtin_aarch64_rbitv8qi (__a);
++ return (int64x1_t) {__builtin_aarch64_ashldi (__a[0], __b)};
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vrbit_u8 (uint8x8_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_n_u8 (uint8x8_t __a, const int __b)
+ {
+- return (uint8x8_t) __builtin_aarch64_rbitv8qi ((int8x8_t) __a);
++ return (uint8x8_t) __builtin_aarch64_ashlv8qi ((int8x8_t) __a, __b);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vrbitq_p8 (poly8x16_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_n_u16 (uint16x4_t __a, const int __b)
+ {
+- return (poly8x16_t) __builtin_aarch64_rbitv16qi ((int8x16_t)__a);
++ return (uint16x4_t) __builtin_aarch64_ashlv4hi ((int16x4_t) __a, __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vrbitq_s8 (int8x16_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_n_u32 (uint32x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_rbitv16qi (__a);
++ return (uint32x2_t) __builtin_aarch64_ashlv2si ((int32x2_t) __a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vrbitq_u8 (uint8x16_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_n_u64 (uint64x1_t __a, const int __b)
+ {
+- return (uint8x16_t) __builtin_aarch64_rbitv16qi ((int8x16_t) __a);
++ return (uint64x1_t) {__builtin_aarch64_ashldi ((int64_t) __a[0], __b)};
+ }
+
+-/* vrecpe */
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_n_s8 (int8x16_t __a, const int __b)
++{
++ return (int8x16_t) __builtin_aarch64_ashlv16qi (__a, __b);
++}
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vrecpe_u32 (uint32x2_t __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_n_s16 (int16x8_t __a, const int __b)
+ {
+- return (uint32x2_t) __builtin_aarch64_urecpev2si ((int32x2_t) __a);
++ return (int16x8_t) __builtin_aarch64_ashlv8hi (__a, __b);
+ }
+-
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vrecpeq_u32 (uint32x4_t __a)
++
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_n_s32 (int32x4_t __a, const int __b)
+ {
+- return (uint32x4_t) __builtin_aarch64_urecpev4si ((int32x4_t) __a);
++ return (int32x4_t) __builtin_aarch64_ashlv4si (__a, __b);
+ }
+
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vrecpes_f32 (float32_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_n_s64 (int64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_frecpesf (__a);
++ return (int64x2_t) __builtin_aarch64_ashlv2di (__a, __b);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vrecped_f64 (float64_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_n_u8 (uint8x16_t __a, const int __b)
+ {
+- return __builtin_aarch64_frecpedf (__a);
++ return (uint8x16_t) __builtin_aarch64_ashlv16qi ((int8x16_t) __a, __b);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrecpe_f32 (float32x2_t __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_n_u16 (uint16x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_frecpev2sf (__a);
++ return (uint16x8_t) __builtin_aarch64_ashlv8hi ((int16x8_t) __a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrecpeq_f32 (float32x4_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_n_u32 (uint32x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_frecpev4sf (__a);
++ return (uint32x4_t) __builtin_aarch64_ashlv4si ((int32x4_t) __a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrecpeq_f64 (float64x2_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_n_u64 (uint64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_frecpev2df (__a);
++ return (uint64x2_t) __builtin_aarch64_ashlv2di ((int64x2_t) __a, __b);
+ }
+
+-/* vrecps */
+-
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vrecpss_f32 (float32_t __a, float32_t __b)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshld_n_s64 (int64_t __a, const int __b)
+ {
+- return __builtin_aarch64_frecpssf (__a, __b);
++ return __builtin_aarch64_ashldi (__a, __b);
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vrecpsd_f64 (float64_t __a, float64_t __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshld_n_u64 (uint64_t __a, const int __b)
+ {
+- return __builtin_aarch64_frecpsdf (__a, __b);
++ return (uint64_t) __builtin_aarch64_ashldi (__a, __b);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrecps_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- return __builtin_aarch64_frecpsv2sf (__a, __b);
++ return __builtin_aarch64_sshlv8qi (__a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrecpsq_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- return __builtin_aarch64_frecpsv4sf (__a, __b);
++ return __builtin_aarch64_sshlv4hi (__a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrecpsq_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- return __builtin_aarch64_frecpsv2df (__a, __b);
++ return __builtin_aarch64_sshlv2si (__a, __b);
+ }
+
+-/* vrecpx */
+-
+-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
+-vrecpxs_f32 (float32_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_s64 (int64x1_t __a, int64x1_t __b)
+ {
+- return __builtin_aarch64_frecpxsf (__a);
++ return (int64x1_t) {__builtin_aarch64_sshldi (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
+-vrecpxd_f64 (float64_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_u8 (uint8x8_t __a, int8x8_t __b)
+ {
+- return __builtin_aarch64_frecpxdf (__a);
++ return __builtin_aarch64_ushlv8qi_uus (__a, __b);
+ }
+
+-
+-/* vrev */
+-
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vrev16_p8 (poly8x8_t a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_u16 (uint16x4_t __a, int16x4_t __b)
+ {
+- return __builtin_shuffle (a, (uint8x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
++ return __builtin_aarch64_ushlv4hi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vrev16_s8 (int8x8_t a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_u32 (uint32x2_t __a, int32x2_t __b)
+ {
+- return __builtin_shuffle (a, (uint8x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
++ return __builtin_aarch64_ushlv2si_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vrev16_u8 (uint8x8_t a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshl_u64 (uint64x1_t __a, int64x1_t __b)
+ {
+- return __builtin_shuffle (a, (uint8x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
++ return (uint64x1_t) {__builtin_aarch64_ushldi_uus (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vrev16q_p8 (poly8x16_t a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- return __builtin_shuffle (a,
+- (uint8x16_t) { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 });
++ return __builtin_aarch64_sshlv16qi (__a, __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vrev16q_s8 (int8x16_t a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- return __builtin_shuffle (a,
+- (uint8x16_t) { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 });
++ return __builtin_aarch64_sshlv8hi (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vrev16q_u8 (uint8x16_t a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- return __builtin_shuffle (a,
+- (uint8x16_t) { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 });
++ return __builtin_aarch64_sshlv4si (__a, __b);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vrev32_p8 (poly8x8_t a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- return __builtin_shuffle (a, (uint8x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
++ return __builtin_aarch64_sshlv2di (__a, __b);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vrev32_p16 (poly16x4_t a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_u8 (uint8x16_t __a, int8x16_t __b)
+ {
+- return __builtin_shuffle (a, (uint16x4_t) { 1, 0, 3, 2 });
++ return __builtin_aarch64_ushlv16qi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vrev32_s8 (int8x8_t a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_u16 (uint16x8_t __a, int16x8_t __b)
+ {
+- return __builtin_shuffle (a, (uint8x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
++ return __builtin_aarch64_ushlv8hi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vrev32_s16 (int16x4_t a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_u32 (uint32x4_t __a, int32x4_t __b)
+ {
+- return __builtin_shuffle (a, (uint16x4_t) { 1, 0, 3, 2 });
++ return __builtin_aarch64_ushlv4si_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vrev32_u8 (uint8x8_t a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshlq_u64 (uint64x2_t __a, int64x2_t __b)
+ {
+- return __builtin_shuffle (a, (uint8x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
++ return __builtin_aarch64_ushlv2di_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vrev32_u16 (uint16x4_t a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshld_s64 (int64_t __a, int64_t __b)
+ {
+- return __builtin_shuffle (a, (uint16x4_t) { 1, 0, 3, 2 });
++ return __builtin_aarch64_sshldi (__a, __b);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vrev32q_p8 (poly8x16_t a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshld_u64 (uint64_t __a, uint64_t __b)
+ {
+- return __builtin_shuffle (a,
+- (uint8x16_t) { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 });
++ return __builtin_aarch64_ushldi_uus (__a, __b);
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vrev32q_p16 (poly16x8_t a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_high_n_s8 (int8x16_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
++ return __builtin_aarch64_sshll2_nv16qi (__a, __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vrev32q_s8 (int8x16_t a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_high_n_s16 (int16x8_t __a, const int __b)
+ {
+- return __builtin_shuffle (a,
+- (uint8x16_t) { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 });
++ return __builtin_aarch64_sshll2_nv8hi (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vrev32q_s16 (int16x8_t a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_high_n_s32 (int32x4_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
++ return __builtin_aarch64_sshll2_nv4si (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vrev32q_u8 (uint8x16_t a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_high_n_u8 (uint8x16_t __a, const int __b)
+ {
+- return __builtin_shuffle (a,
+- (uint8x16_t) { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 });
++ return (uint16x8_t) __builtin_aarch64_ushll2_nv16qi ((int8x16_t) __a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vrev32q_u16 (uint16x8_t a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_high_n_u16 (uint16x8_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint16x8_t) { 1, 0, 3, 2, 5, 4, 7, 6 });
++ return (uint32x4_t) __builtin_aarch64_ushll2_nv8hi ((int16x8_t) __a, __b);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrev64_f32 (float32x2_t a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_high_n_u32 (uint32x4_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint32x2_t) { 1, 0 });
++ return (uint64x2_t) __builtin_aarch64_ushll2_nv4si ((int32x4_t) __a, __b);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vrev64_p8 (poly8x8_t a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_n_s8 (int8x8_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint8x8_t) { 7, 6, 5, 4, 3, 2, 1, 0 });
++ return __builtin_aarch64_sshll_nv8qi (__a, __b);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vrev64_p16 (poly16x4_t a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_n_s16 (int16x4_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint16x4_t) { 3, 2, 1, 0 });
++ return __builtin_aarch64_sshll_nv4hi (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vrev64_s8 (int8x8_t a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_n_s32 (int32x2_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint8x8_t) { 7, 6, 5, 4, 3, 2, 1, 0 });
++ return __builtin_aarch64_sshll_nv2si (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vrev64_s16 (int16x4_t a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_n_u8 (uint8x8_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint16x4_t) { 3, 2, 1, 0 });
++ return __builtin_aarch64_ushll_nv8qi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vrev64_s32 (int32x2_t a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_n_u16 (uint16x4_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint32x2_t) { 1, 0 });
++ return __builtin_aarch64_ushll_nv4hi_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vrev64_u8 (uint8x8_t a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshll_n_u32 (uint32x2_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint8x8_t) { 7, 6, 5, 4, 3, 2, 1, 0 });
++ return __builtin_aarch64_ushll_nv2si_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vrev64_u16 (uint16x4_t a)
++/* vshr */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshr_n_s8 (int8x8_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint16x4_t) { 3, 2, 1, 0 });
++ return (int8x8_t) __builtin_aarch64_ashrv8qi (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vrev64_u32 (uint32x2_t a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshr_n_s16 (int16x4_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint32x2_t) { 1, 0 });
++ return (int16x4_t) __builtin_aarch64_ashrv4hi (__a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrev64q_f32 (float32x4_t a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshr_n_s32 (int32x2_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint32x4_t) { 1, 0, 3, 2 });
++ return (int32x2_t) __builtin_aarch64_ashrv2si (__a, __b);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vrev64q_p8 (poly8x16_t a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshr_n_s64 (int64x1_t __a, const int __b)
+ {
+- return __builtin_shuffle (a,
+- (uint8x16_t) { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 });
++ return (int64x1_t) {__builtin_aarch64_ashr_simddi (__a[0], __b)};
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vrev64q_p16 (poly16x8_t a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshr_n_u8 (uint8x8_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
++ return (uint8x8_t) __builtin_aarch64_lshrv8qi ((int8x8_t) __a, __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vrev64q_s8 (int8x16_t a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshr_n_u16 (uint16x4_t __a, const int __b)
+ {
+- return __builtin_shuffle (a,
+- (uint8x16_t) { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 });
++ return (uint16x4_t) __builtin_aarch64_lshrv4hi ((int16x4_t) __a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vrev64q_s16 (int16x8_t a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshr_n_u32 (uint32x2_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
++ return (uint32x2_t) __builtin_aarch64_lshrv2si ((int32x2_t) __a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vrev64q_s32 (int32x4_t a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshr_n_u64 (uint64x1_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint32x4_t) { 1, 0, 3, 2 });
++ return (uint64x1_t) {__builtin_aarch64_lshr_simddi_uus ( __a[0], __b)};
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vrev64q_u8 (uint8x16_t a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshrq_n_s8 (int8x16_t __a, const int __b)
+ {
+- return __builtin_shuffle (a,
+- (uint8x16_t) { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 });
++ return (int8x16_t) __builtin_aarch64_ashrv16qi (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vrev64q_u16 (uint16x8_t a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshrq_n_s16 (int16x8_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint16x8_t) { 3, 2, 1, 0, 7, 6, 5, 4 });
++ return (int16x8_t) __builtin_aarch64_ashrv8hi (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vrev64q_u32 (uint32x4_t a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshrq_n_s32 (int32x4_t __a, const int __b)
+ {
+- return __builtin_shuffle (a, (uint32x4_t) { 1, 0, 3, 2 });
++ return (int32x4_t) __builtin_aarch64_ashrv4si (__a, __b);
+ }
+
+-/* vrnd */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrnd_f32 (float32x2_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshrq_n_s64 (int64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_btruncv2sf (__a);
++ return (int64x2_t) __builtin_aarch64_ashrv2di (__a, __b);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vrnd_f64 (float64x1_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshrq_n_u8 (uint8x16_t __a, const int __b)
+ {
+- return vset_lane_f64 (__builtin_trunc (vget_lane_f64 (__a, 0)), __a, 0);
++ return (uint8x16_t) __builtin_aarch64_lshrv16qi ((int8x16_t) __a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrndq_f32 (float32x4_t __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshrq_n_u16 (uint16x8_t __a, const int __b)
+ {
+- return __builtin_aarch64_btruncv4sf (__a);
++ return (uint16x8_t) __builtin_aarch64_lshrv8hi ((int16x8_t) __a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrndq_f64 (float64x2_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshrq_n_u32 (uint32x4_t __a, const int __b)
+ {
+- return __builtin_aarch64_btruncv2df (__a);
++ return (uint32x4_t) __builtin_aarch64_lshrv4si ((int32x4_t) __a, __b);
+ }
+
+-/* vrnda */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrnda_f32 (float32x2_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshrq_n_u64 (uint64x2_t __a, const int __b)
+ {
+- return __builtin_aarch64_roundv2sf (__a);
++ return (uint64x2_t) __builtin_aarch64_lshrv2di ((int64x2_t) __a, __b);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vrnda_f64 (float64x1_t __a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshrd_n_s64 (int64_t __a, const int __b)
+ {
+- return vset_lane_f64 (__builtin_round (vget_lane_f64 (__a, 0)), __a, 0);
++ return __builtin_aarch64_ashr_simddi (__a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrndaq_f32 (float32x4_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vshrd_n_u64 (uint64_t __a, const int __b)
+ {
+- return __builtin_aarch64_roundv4sf (__a);
++ return __builtin_aarch64_lshr_simddi_uus (__a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrndaq_f64 (float64x2_t __a)
++/* vsli */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsli_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
+ {
+- return __builtin_aarch64_roundv2df (__a);
++ return (int8x8_t) __builtin_aarch64_ssli_nv8qi (__a, __b, __c);
+ }
+
+-/* vrndi */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrndi_f32 (float32x2_t __a)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsli_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+ {
+- return __builtin_aarch64_nearbyintv2sf (__a);
++ return (int16x4_t) __builtin_aarch64_ssli_nv4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vrndi_f64 (float64x1_t __a)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsli_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+ {
+- return vset_lane_f64 (__builtin_nearbyint (vget_lane_f64 (__a, 0)), __a, 0);
++ return (int32x2_t) __builtin_aarch64_ssli_nv2si (__a, __b, __c);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrndiq_f32 (float32x4_t __a)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsli_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
+ {
+- return __builtin_aarch64_nearbyintv4sf (__a);
++ return (int64x1_t) {__builtin_aarch64_ssli_ndi (__a[0], __b[0], __c)};
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrndiq_f64 (float64x2_t __a)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsli_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
+ {
+- return __builtin_aarch64_nearbyintv2df (__a);
++ return __builtin_aarch64_usli_nv8qi_uuus (__a, __b, __c);
+ }
+
+-/* vrndm */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrndm_f32 (float32x2_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsli_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+ {
+- return __builtin_aarch64_floorv2sf (__a);
++ return __builtin_aarch64_usli_nv4hi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vrndm_f64 (float64x1_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsli_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+ {
+- return vset_lane_f64 (__builtin_floor (vget_lane_f64 (__a, 0)), __a, 0);
++ return __builtin_aarch64_usli_nv2si_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrndmq_f32 (float32x4_t __a)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsli_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
+ {
+- return __builtin_aarch64_floorv4sf (__a);
++ return (uint64x1_t) {__builtin_aarch64_usli_ndi_uuus (__a[0], __b[0], __c)};
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrndmq_f64 (float64x2_t __a)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsliq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
+ {
+- return __builtin_aarch64_floorv2df (__a);
++ return (int8x16_t) __builtin_aarch64_ssli_nv16qi (__a, __b, __c);
+ }
+
+-/* vrndn */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrndn_f32 (float32x2_t __a)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsliq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
+ {
+- return __builtin_aarch64_frintnv2sf (__a);
++ return (int16x8_t) __builtin_aarch64_ssli_nv8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vrndn_f64 (float64x1_t __a)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsliq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
+ {
+- return (float64x1_t) {__builtin_aarch64_frintndf (__a[0])};
++ return (int32x4_t) __builtin_aarch64_ssli_nv4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrndnq_f32 (float32x4_t __a)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsliq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
+ {
+- return __builtin_aarch64_frintnv4sf (__a);
++ return (int64x2_t) __builtin_aarch64_ssli_nv2di (__a, __b, __c);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrndnq_f64 (float64x2_t __a)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsliq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
+ {
+- return __builtin_aarch64_frintnv2df (__a);
++ return __builtin_aarch64_usli_nv16qi_uuus (__a, __b, __c);
+ }
+
+-/* vrndp */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrndp_f32 (float32x2_t __a)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsliq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
+ {
+- return __builtin_aarch64_ceilv2sf (__a);
++ return __builtin_aarch64_usli_nv8hi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vrndp_f64 (float64x1_t __a)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsliq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
+ {
+- return vset_lane_f64 (__builtin_ceil (vget_lane_f64 (__a, 0)), __a, 0);
++ return __builtin_aarch64_usli_nv4si_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrndpq_f32 (float32x4_t __a)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsliq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
+ {
+- return __builtin_aarch64_ceilv4sf (__a);
++ return __builtin_aarch64_usli_nv2di_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrndpq_f64 (float64x2_t __a)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vslid_n_s64 (int64_t __a, int64_t __b, const int __c)
+ {
+- return __builtin_aarch64_ceilv2df (__a);
++ return __builtin_aarch64_ssli_ndi (__a, __b, __c);
+ }
+
+-/* vrndx */
+-
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vrndx_f32 (float32x2_t __a)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vslid_n_u64 (uint64_t __a, uint64_t __b, const int __c)
+ {
+- return __builtin_aarch64_rintv2sf (__a);
++ return __builtin_aarch64_usli_ndi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vrndx_f64 (float64x1_t __a)
++/* vsqadd */
++
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqadd_u8 (uint8x8_t __a, int8x8_t __b)
+ {
+- return vset_lane_f64 (__builtin_rint (vget_lane_f64 (__a, 0)), __a, 0);
++ return __builtin_aarch64_usqaddv8qi_uus (__a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vrndxq_f32 (float32x4_t __a)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqadd_u16 (uint16x4_t __a, int16x4_t __b)
+ {
+- return __builtin_aarch64_rintv4sf (__a);
++ return __builtin_aarch64_usqaddv4hi_uus (__a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vrndxq_f64 (float64x2_t __a)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqadd_u32 (uint32x2_t __a, int32x2_t __b)
+ {
+- return __builtin_aarch64_rintv2df (__a);
++ return __builtin_aarch64_usqaddv2si_uus (__a, __b);
+ }
+
+-/* vrshl */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vrshl_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqadd_u64 (uint64x1_t __a, int64x1_t __b)
+ {
+- return (int8x8_t) __builtin_aarch64_srshlv8qi (__a, __b);
++ return (uint64x1_t) {__builtin_aarch64_usqadddi_uus (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vrshl_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqaddq_u8 (uint8x16_t __a, int8x16_t __b)
+ {
+- return (int16x4_t) __builtin_aarch64_srshlv4hi (__a, __b);
++ return __builtin_aarch64_usqaddv16qi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vrshl_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqaddq_u16 (uint16x8_t __a, int16x8_t __b)
+ {
+- return (int32x2_t) __builtin_aarch64_srshlv2si (__a, __b);
++ return __builtin_aarch64_usqaddv8hi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vrshl_s64 (int64x1_t __a, int64x1_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqaddq_u32 (uint32x4_t __a, int32x4_t __b)
+ {
+- return (int64x1_t) {__builtin_aarch64_srshldi (__a[0], __b[0])};
++ return __builtin_aarch64_usqaddv4si_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vrshl_u8 (uint8x8_t __a, int8x8_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqaddq_u64 (uint64x2_t __a, int64x2_t __b)
+ {
+- return __builtin_aarch64_urshlv8qi_uus (__a, __b);
++ return __builtin_aarch64_usqaddv2di_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vrshl_u16 (uint16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqaddb_u8 (uint8_t __a, int8_t __b)
+ {
+- return __builtin_aarch64_urshlv4hi_uus (__a, __b);
++ return __builtin_aarch64_usqaddqi_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vrshl_u32 (uint32x2_t __a, int32x2_t __b)
++__extension__ extern __inline uint16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqaddh_u16 (uint16_t __a, int16_t __b)
+ {
+- return __builtin_aarch64_urshlv2si_uus (__a, __b);
++ return __builtin_aarch64_usqaddhi_uus (__a, __b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vrshl_u64 (uint64x1_t __a, int64x1_t __b)
++__extension__ extern __inline uint32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqadds_u32 (uint32_t __a, int32_t __b)
+ {
+- return (uint64x1_t) {__builtin_aarch64_urshldi_uus (__a[0], __b[0])};
++ return __builtin_aarch64_usqaddsi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vrshlq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqaddd_u64 (uint64_t __a, int64_t __b)
+ {
+- return (int8x16_t) __builtin_aarch64_srshlv16qi (__a, __b);
++ return __builtin_aarch64_usqadddi_uus (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vrshlq_s16 (int16x8_t __a, int16x8_t __b)
++/* vsqrt */
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqrt_f32 (float32x2_t a)
+ {
+- return (int16x8_t) __builtin_aarch64_srshlv8hi (__a, __b);
++ return __builtin_aarch64_sqrtv2sf (a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vrshlq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqrtq_f32 (float32x4_t a)
+ {
+- return (int32x4_t) __builtin_aarch64_srshlv4si (__a, __b);
++ return __builtin_aarch64_sqrtv4sf (a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vrshlq_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline float64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqrt_f64 (float64x1_t a)
+ {
+- return (int64x2_t) __builtin_aarch64_srshlv2di (__a, __b);
++ return (float64x1_t) { __builtin_aarch64_sqrtdf (a[0]) };
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vrshlq_u8 (uint8x16_t __a, int8x16_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqrtq_f64 (float64x2_t a)
+ {
+- return __builtin_aarch64_urshlv16qi_uus (__a, __b);
++ return __builtin_aarch64_sqrtv2df (a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vrshlq_u16 (uint16x8_t __a, int16x8_t __b)
++/* vsra */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsra_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshlv8hi_uus (__a, __b);
++ return (int8x8_t) __builtin_aarch64_ssra_nv8qi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vrshlq_u32 (uint32x4_t __a, int32x4_t __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsra_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshlv4si_uus (__a, __b);
++ return (int16x4_t) __builtin_aarch64_ssra_nv4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vrshlq_u64 (uint64x2_t __a, int64x2_t __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsra_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshlv2di_uus (__a, __b);
++ return (int32x2_t) __builtin_aarch64_ssra_nv2si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vrshld_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsra_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
+ {
+- return __builtin_aarch64_srshldi (__a, __b);
++ return (int64x1_t) {__builtin_aarch64_ssra_ndi (__a[0], __b[0], __c)};
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vrshld_u64 (uint64_t __a, int64_t __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsra_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshldi_uus (__a, __b);
++ return __builtin_aarch64_usra_nv8qi_uuus (__a, __b, __c);
+ }
+
+-/* vrshr */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vrshr_n_s8 (int8x8_t __a, const int __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsra_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+ {
+- return (int8x8_t) __builtin_aarch64_srshr_nv8qi (__a, __b);
++ return __builtin_aarch64_usra_nv4hi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vrshr_n_s16 (int16x4_t __a, const int __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsra_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+ {
+- return (int16x4_t) __builtin_aarch64_srshr_nv4hi (__a, __b);
++ return __builtin_aarch64_usra_nv2si_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vrshr_n_s32 (int32x2_t __a, const int __b)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsra_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
+ {
+- return (int32x2_t) __builtin_aarch64_srshr_nv2si (__a, __b);
++ return (uint64x1_t) {__builtin_aarch64_usra_ndi_uuus (__a[0], __b[0], __c)};
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vrshr_n_s64 (int64x1_t __a, const int __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsraq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
+ {
+- return (int64x1_t) {__builtin_aarch64_srshr_ndi (__a[0], __b)};
++ return (int8x16_t) __builtin_aarch64_ssra_nv16qi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vrshr_n_u8 (uint8x8_t __a, const int __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsraq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshr_nv8qi_uus (__a, __b);
++ return (int16x8_t) __builtin_aarch64_ssra_nv8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vrshr_n_u16 (uint16x4_t __a, const int __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsraq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshr_nv4hi_uus (__a, __b);
++ return (int32x4_t) __builtin_aarch64_ssra_nv4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vrshr_n_u32 (uint32x2_t __a, const int __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsraq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshr_nv2si_uus (__a, __b);
++ return (int64x2_t) __builtin_aarch64_ssra_nv2di (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vrshr_n_u64 (uint64x1_t __a, const int __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsraq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
+ {
+- return (uint64x1_t) {__builtin_aarch64_urshr_ndi_uus (__a[0], __b)};
++ return __builtin_aarch64_usra_nv16qi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vrshrq_n_s8 (int8x16_t __a, const int __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsraq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
+ {
+- return (int8x16_t) __builtin_aarch64_srshr_nv16qi (__a, __b);
++ return __builtin_aarch64_usra_nv8hi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vrshrq_n_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsraq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
+ {
+- return (int16x8_t) __builtin_aarch64_srshr_nv8hi (__a, __b);
++ return __builtin_aarch64_usra_nv4si_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vrshrq_n_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsraq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
+ {
+- return (int32x4_t) __builtin_aarch64_srshr_nv4si (__a, __b);
++ return __builtin_aarch64_usra_nv2di_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vrshrq_n_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsrad_n_s64 (int64_t __a, int64_t __b, const int __c)
+ {
+- return (int64x2_t) __builtin_aarch64_srshr_nv2di (__a, __b);
++ return __builtin_aarch64_ssra_ndi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vrshrq_n_u8 (uint8x16_t __a, const int __b)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsrad_n_u64 (uint64_t __a, uint64_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshr_nv16qi_uus (__a, __b);
++ return __builtin_aarch64_usra_ndi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vrshrq_n_u16 (uint16x8_t __a, const int __b)
++/* vsri */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsri_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshr_nv8hi_uus (__a, __b);
++ return (int8x8_t) __builtin_aarch64_ssri_nv8qi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vrshrq_n_u32 (uint32x4_t __a, const int __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsri_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshr_nv4si_uus (__a, __b);
++ return (int16x4_t) __builtin_aarch64_ssri_nv4hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vrshrq_n_u64 (uint64x2_t __a, const int __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsri_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshr_nv2di_uus (__a, __b);
++ return (int32x2_t) __builtin_aarch64_ssri_nv2si (__a, __b, __c);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vrshrd_n_s64 (int64_t __a, const int __b)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsri_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
+ {
+- return __builtin_aarch64_srshr_ndi (__a, __b);
++ return (int64x1_t) {__builtin_aarch64_ssri_ndi (__a[0], __b[0], __c)};
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vrshrd_n_u64 (uint64_t __a, const int __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsri_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
+ {
+- return __builtin_aarch64_urshr_ndi_uus (__a, __b);
++ return __builtin_aarch64_usri_nv8qi_uuus (__a, __b, __c);
+ }
+
+-/* vrsra */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vrsra_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsri_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
+ {
+- return (int8x8_t) __builtin_aarch64_srsra_nv8qi (__a, __b, __c);
++ return __builtin_aarch64_usri_nv4hi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vrsra_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsri_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
+ {
+- return (int16x4_t) __builtin_aarch64_srsra_nv4hi (__a, __b, __c);
++ return __builtin_aarch64_usri_nv2si_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vrsra_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsri_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
+ {
+- return (int32x2_t) __builtin_aarch64_srsra_nv2si (__a, __b, __c);
++ return (uint64x1_t) {__builtin_aarch64_usri_ndi_uuus (__a[0], __b[0], __c)};
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vrsra_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsriq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
+ {
+- return (int64x1_t) {__builtin_aarch64_srsra_ndi (__a[0], __b[0], __c)};
++ return (int8x16_t) __builtin_aarch64_ssri_nv16qi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vrsra_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsriq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
+ {
+- return __builtin_aarch64_ursra_nv8qi_uuus (__a, __b, __c);
++ return (int16x8_t) __builtin_aarch64_ssri_nv8hi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vrsra_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsriq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
+ {
+- return __builtin_aarch64_ursra_nv4hi_uuus (__a, __b, __c);
++ return (int32x4_t) __builtin_aarch64_ssri_nv4si (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vrsra_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsriq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
+ {
+- return __builtin_aarch64_ursra_nv2si_uuus (__a, __b, __c);
++ return (int64x2_t) __builtin_aarch64_ssri_nv2di (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vrsra_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsriq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
+ {
+- return (uint64x1_t) {__builtin_aarch64_ursra_ndi_uuus (__a[0], __b[0], __c)};
++ return __builtin_aarch64_usri_nv16qi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vrsraq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsriq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
+ {
+- return (int8x16_t) __builtin_aarch64_srsra_nv16qi (__a, __b, __c);
++ return __builtin_aarch64_usri_nv8hi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vrsraq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsriq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
+ {
+- return (int16x8_t) __builtin_aarch64_srsra_nv8hi (__a, __b, __c);
++ return __builtin_aarch64_usri_nv4si_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vrsraq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsriq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
+ {
+- return (int32x4_t) __builtin_aarch64_srsra_nv4si (__a, __b, __c);
++ return __builtin_aarch64_usri_nv2di_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vrsraq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsrid_n_s64 (int64_t __a, int64_t __b, const int __c)
+ {
+- return (int64x2_t) __builtin_aarch64_srsra_nv2di (__a, __b, __c);
++ return __builtin_aarch64_ssri_ndi (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vrsraq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsrid_n_u64 (uint64_t __a, uint64_t __b, const int __c)
+ {
+- return __builtin_aarch64_ursra_nv16qi_uuus (__a, __b, __c);
++ return __builtin_aarch64_usri_ndi_uuus (__a, __b, __c);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vrsraq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
++/* vst1 */
++
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_f16 (float16_t *__a, float16x4_t __b)
+ {
+- return __builtin_aarch64_ursra_nv8hi_uuus (__a, __b, __c);
++ __builtin_aarch64_st1v4hf (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vrsraq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_f32 (float32_t *a, float32x2_t b)
+ {
+- return __builtin_aarch64_ursra_nv4si_uuus (__a, __b, __c);
++ __builtin_aarch64_st1v2sf ((__builtin_aarch64_simd_sf *) a, b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vrsraq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_f64 (float64_t *a, float64x1_t b)
+ {
+- return __builtin_aarch64_ursra_nv2di_uuus (__a, __b, __c);
++ *a = b[0];
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vrsrad_n_s64 (int64_t __a, int64_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_p8 (poly8_t *a, poly8x8_t b)
+ {
+- return __builtin_aarch64_srsra_ndi (__a, __b, __c);
++ __builtin_aarch64_st1v8qi ((__builtin_aarch64_simd_qi *) a,
++ (int8x8_t) b);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vrsrad_n_u64 (uint64_t __a, uint64_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_p16 (poly16_t *a, poly16x4_t b)
+ {
+- return __builtin_aarch64_ursra_ndi_uuus (__a, __b, __c);
++ __builtin_aarch64_st1v4hi ((__builtin_aarch64_simd_hi *) a,
++ (int16x4_t) b);
+ }
+
+-#pragma GCC push_options
+-#pragma GCC target ("+nothing+crypto")
+-
+-/* vsha1 */
+-
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha1cq_u32 (uint32x4_t hash_abcd, uint32_t hash_e, uint32x4_t wk)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_s8 (int8_t *a, int8x8_t b)
+ {
+- return __builtin_aarch64_crypto_sha1cv4si_uuuu (hash_abcd, hash_e, wk);
++ __builtin_aarch64_st1v8qi ((__builtin_aarch64_simd_qi *) a, b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha1mq_u32 (uint32x4_t hash_abcd, uint32_t hash_e, uint32x4_t wk)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_s16 (int16_t *a, int16x4_t b)
+ {
+- return __builtin_aarch64_crypto_sha1mv4si_uuuu (hash_abcd, hash_e, wk);
++ __builtin_aarch64_st1v4hi ((__builtin_aarch64_simd_hi *) a, b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha1pq_u32 (uint32x4_t hash_abcd, uint32_t hash_e, uint32x4_t wk)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_s32 (int32_t *a, int32x2_t b)
+ {
+- return __builtin_aarch64_crypto_sha1pv4si_uuuu (hash_abcd, hash_e, wk);
++ __builtin_aarch64_st1v2si ((__builtin_aarch64_simd_si *) a, b);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vsha1h_u32 (uint32_t hash_e)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_s64 (int64_t *a, int64x1_t b)
+ {
+- return __builtin_aarch64_crypto_sha1hsi_uu (hash_e);
++ *a = b[0];
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha1su0q_u32 (uint32x4_t w0_3, uint32x4_t w4_7, uint32x4_t w8_11)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_u8 (uint8_t *a, uint8x8_t b)
+ {
+- return __builtin_aarch64_crypto_sha1su0v4si_uuuu (w0_3, w4_7, w8_11);
++ __builtin_aarch64_st1v8qi ((__builtin_aarch64_simd_qi *) a,
++ (int8x8_t) b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha1su1q_u32 (uint32x4_t tw0_3, uint32x4_t w12_15)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_u16 (uint16_t *a, uint16x4_t b)
+ {
+- return __builtin_aarch64_crypto_sha1su1v4si_uuu (tw0_3, w12_15);
++ __builtin_aarch64_st1v4hi ((__builtin_aarch64_simd_hi *) a,
++ (int16x4_t) b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha256hq_u32 (uint32x4_t hash_abcd, uint32x4_t hash_efgh, uint32x4_t wk)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_u32 (uint32_t *a, uint32x2_t b)
+ {
+- return __builtin_aarch64_crypto_sha256hv4si_uuuu (hash_abcd, hash_efgh, wk);
++ __builtin_aarch64_st1v2si ((__builtin_aarch64_simd_si *) a,
++ (int32x2_t) b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha256h2q_u32 (uint32x4_t hash_efgh, uint32x4_t hash_abcd, uint32x4_t wk)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_u64 (uint64_t *a, uint64x1_t b)
+ {
+- return __builtin_aarch64_crypto_sha256h2v4si_uuuu (hash_efgh, hash_abcd, wk);
++ *a = b[0];
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha256su0q_u32 (uint32x4_t w0_3, uint32x4_t w4_7)
++/* vst1q */
++
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_f16 (float16_t *__a, float16x8_t __b)
+ {
+- return __builtin_aarch64_crypto_sha256su0v4si_uuu (w0_3, w4_7);
++ __builtin_aarch64_st1v8hf (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha256su1q_u32 (uint32x4_t tw0_3, uint32x4_t w8_11, uint32x4_t w12_15)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_f32 (float32_t *a, float32x4_t b)
+ {
+- return __builtin_aarch64_crypto_sha256su1v4si_uuuu (tw0_3, w8_11, w12_15);
++ __builtin_aarch64_st1v4sf ((__builtin_aarch64_simd_sf *) a, b);
+ }
+
+-__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+-vmull_p64 (poly64_t a, poly64_t b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_f64 (float64_t *a, float64x2_t b)
+ {
+- return
+- __builtin_aarch64_crypto_pmulldi_ppp (a, b);
++ __builtin_aarch64_st1v2df ((__builtin_aarch64_simd_df *) a, b);
+ }
+
+-__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+-vmull_high_p64 (poly64x2_t a, poly64x2_t b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_p8 (poly8_t *a, poly8x16_t b)
+ {
+- return __builtin_aarch64_crypto_pmullv2di_ppp (a, b);
++ __builtin_aarch64_st1v16qi ((__builtin_aarch64_simd_qi *) a,
++ (int8x16_t) b);
+ }
+
+-#pragma GCC pop_options
+-
+-/* vshl */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vshl_n_s8 (int8x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_p16 (poly16_t *a, poly16x8_t b)
+ {
+- return (int8x8_t) __builtin_aarch64_ashlv8qi (__a, __b);
++ __builtin_aarch64_st1v8hi ((__builtin_aarch64_simd_hi *) a,
++ (int16x8_t) b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vshl_n_s16 (int16x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_s8 (int8_t *a, int8x16_t b)
+ {
+- return (int16x4_t) __builtin_aarch64_ashlv4hi (__a, __b);
++ __builtin_aarch64_st1v16qi ((__builtin_aarch64_simd_qi *) a, b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vshl_n_s32 (int32x2_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_s16 (int16_t *a, int16x8_t b)
+ {
+- return (int32x2_t) __builtin_aarch64_ashlv2si (__a, __b);
++ __builtin_aarch64_st1v8hi ((__builtin_aarch64_simd_hi *) a, b);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vshl_n_s64 (int64x1_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_s32 (int32_t *a, int32x4_t b)
+ {
+- return (int64x1_t) {__builtin_aarch64_ashldi (__a[0], __b)};
++ __builtin_aarch64_st1v4si ((__builtin_aarch64_simd_si *) a, b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vshl_n_u8 (uint8x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_s64 (int64_t *a, int64x2_t b)
+ {
+- return (uint8x8_t) __builtin_aarch64_ashlv8qi ((int8x8_t) __a, __b);
++ __builtin_aarch64_st1v2di ((__builtin_aarch64_simd_di *) a, b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vshl_n_u16 (uint16x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_u8 (uint8_t *a, uint8x16_t b)
+ {
+- return (uint16x4_t) __builtin_aarch64_ashlv4hi ((int16x4_t) __a, __b);
++ __builtin_aarch64_st1v16qi ((__builtin_aarch64_simd_qi *) a,
++ (int8x16_t) b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vshl_n_u32 (uint32x2_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_u16 (uint16_t *a, uint16x8_t b)
+ {
+- return (uint32x2_t) __builtin_aarch64_ashlv2si ((int32x2_t) __a, __b);
++ __builtin_aarch64_st1v8hi ((__builtin_aarch64_simd_hi *) a,
++ (int16x8_t) b);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vshl_n_u64 (uint64x1_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_u32 (uint32_t *a, uint32x4_t b)
+ {
+- return (uint64x1_t) {__builtin_aarch64_ashldi ((int64_t) __a[0], __b)};
++ __builtin_aarch64_st1v4si ((__builtin_aarch64_simd_si *) a,
++ (int32x4_t) b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vshlq_n_s8 (int8x16_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_u64 (uint64_t *a, uint64x2_t b)
+ {
+- return (int8x16_t) __builtin_aarch64_ashlv16qi (__a, __b);
++ __builtin_aarch64_st1v2di ((__builtin_aarch64_simd_di *) a,
++ (int64x2_t) b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vshlq_n_s16 (int16x8_t __a, const int __b)
++/* vst1_lane */
++
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_f16 (float16_t *__a, float16x4_t __b, const int __lane)
+ {
+- return (int16x8_t) __builtin_aarch64_ashlv8hi (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vshlq_n_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_f32 (float32_t *__a, float32x2_t __b, const int __lane)
+ {
+- return (int32x4_t) __builtin_aarch64_ashlv4si (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vshlq_n_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_f64 (float64_t *__a, float64x1_t __b, const int __lane)
+ {
+- return (int64x2_t) __builtin_aarch64_ashlv2di (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vshlq_n_u8 (uint8x16_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_p8 (poly8_t *__a, poly8x8_t __b, const int __lane)
+ {
+- return (uint8x16_t) __builtin_aarch64_ashlv16qi ((int8x16_t) __a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vshlq_n_u16 (uint16x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_p16 (poly16_t *__a, poly16x4_t __b, const int __lane)
+ {
+- return (uint16x8_t) __builtin_aarch64_ashlv8hi ((int16x8_t) __a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vshlq_n_u32 (uint32x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_s8 (int8_t *__a, int8x8_t __b, const int __lane)
+ {
+- return (uint32x4_t) __builtin_aarch64_ashlv4si ((int32x4_t) __a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vshlq_n_u64 (uint64x2_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_s16 (int16_t *__a, int16x4_t __b, const int __lane)
+ {
+- return (uint64x2_t) __builtin_aarch64_ashlv2di ((int64x2_t) __a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vshld_n_s64 (int64_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_s32 (int32_t *__a, int32x2_t __b, const int __lane)
+ {
+- return __builtin_aarch64_ashldi (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vshld_n_u64 (uint64_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_s64 (int64_t *__a, int64x1_t __b, const int __lane)
+ {
+- return (uint64_t) __builtin_aarch64_ashldi (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vshl_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_u8 (uint8_t *__a, uint8x8_t __b, const int __lane)
+ {
+- return __builtin_aarch64_sshlv8qi (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vshl_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_u16 (uint16_t *__a, uint16x4_t __b, const int __lane)
+ {
+- return __builtin_aarch64_sshlv4hi (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vshl_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_u32 (uint32_t *__a, uint32x2_t __b, const int __lane)
+ {
+- return __builtin_aarch64_sshlv2si (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vshl_s64 (int64x1_t __a, int64x1_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1_lane_u64 (uint64_t *__a, uint64x1_t __b, const int __lane)
+ {
+- return (int64x1_t) {__builtin_aarch64_sshldi (__a[0], __b[0])};
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vshl_u8 (uint8x8_t __a, int8x8_t __b)
++/* vst1q_lane */
++
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_f16 (float16_t *__a, float16x8_t __b, const int __lane)
+ {
+- return __builtin_aarch64_ushlv8qi_uus (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vshl_u16 (uint16x4_t __a, int16x4_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_f32 (float32_t *__a, float32x4_t __b, const int __lane)
+ {
+- return __builtin_aarch64_ushlv4hi_uus (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vshl_u32 (uint32x2_t __a, int32x2_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_f64 (float64_t *__a, float64x2_t __b, const int __lane)
+ {
+- return __builtin_aarch64_ushlv2si_uus (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vshl_u64 (uint64x1_t __a, int64x1_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_p8 (poly8_t *__a, poly8x16_t __b, const int __lane)
+ {
+- return (uint64x1_t) {__builtin_aarch64_ushldi_uus (__a[0], __b[0])};
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vshlq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_p16 (poly16_t *__a, poly16x8_t __b, const int __lane)
+ {
+- return __builtin_aarch64_sshlv16qi (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vshlq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_s8 (int8_t *__a, int8x16_t __b, const int __lane)
+ {
+- return __builtin_aarch64_sshlv8hi (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vshlq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_s16 (int16_t *__a, int16x8_t __b, const int __lane)
+ {
+- return __builtin_aarch64_sshlv4si (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vshlq_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_s32 (int32_t *__a, int32x4_t __b, const int __lane)
+ {
+- return __builtin_aarch64_sshlv2di (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vshlq_u8 (uint8x16_t __a, int8x16_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_s64 (int64_t *__a, int64x2_t __b, const int __lane)
+ {
+- return __builtin_aarch64_ushlv16qi_uus (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vshlq_u16 (uint16x8_t __a, int16x8_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_u8 (uint8_t *__a, uint8x16_t __b, const int __lane)
+ {
+- return __builtin_aarch64_ushlv8hi_uus (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vshlq_u32 (uint32x4_t __a, int32x4_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_u16 (uint16_t *__a, uint16x8_t __b, const int __lane)
+ {
+- return __builtin_aarch64_ushlv4si_uus (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vshlq_u64 (uint64x2_t __a, int64x2_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_u32 (uint32_t *__a, uint32x4_t __b, const int __lane)
+ {
+- return __builtin_aarch64_ushlv2di_uus (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vshld_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst1q_lane_u64 (uint64_t *__a, uint64x2_t __b, const int __lane)
+ {
+- return __builtin_aarch64_sshldi (__a, __b);
++ *__a = __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vshld_u64 (uint64_t __a, uint64_t __b)
++/* vstn */
++
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_s64 (int64_t * __a, int64x1x2_t val)
+ {
+- return __builtin_aarch64_ushldi_uus (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ int64x2x2_t temp;
++ temp.val[0] = vcombine_s64 (val.val[0], vcreate_s64 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s64 (val.val[1], vcreate_s64 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[1], 1);
++ __builtin_aarch64_st2di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vshll_high_n_s8 (int8x16_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_u64 (uint64_t * __a, uint64x1x2_t val)
+ {
+- return __builtin_aarch64_sshll2_nv16qi (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ uint64x2x2_t temp;
++ temp.val[0] = vcombine_u64 (val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u64 (val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[1], 1);
++ __builtin_aarch64_st2di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vshll_high_n_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_f64 (float64_t * __a, float64x1x2_t val)
+ {
+- return __builtin_aarch64_sshll2_nv8hi (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ float64x2x2_t temp;
++ temp.val[0] = vcombine_f64 (val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_f64 (val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) temp.val[1], 1);
++ __builtin_aarch64_st2df ((__builtin_aarch64_simd_df *) __a, __o);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vshll_high_n_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_s8 (int8_t * __a, int8x8x2_t val)
+ {
+- return __builtin_aarch64_sshll2_nv4si (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ int8x16x2_t temp;
++ temp.val[0] = vcombine_s8 (val.val[0], vcreate_s8 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s8 (val.val[1], vcreate_s8 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[1], 1);
++ __builtin_aarch64_st2v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vshll_high_n_u8 (uint8x16_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_p8 (poly8_t * __a, poly8x8x2_t val)
+ {
+- return (uint16x8_t) __builtin_aarch64_ushll2_nv16qi ((int8x16_t) __a, __b);
++ __builtin_aarch64_simd_oi __o;
++ poly8x16x2_t temp;
++ temp.val[0] = vcombine_p8 (val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_p8 (val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[1], 1);
++ __builtin_aarch64_st2v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vshll_high_n_u16 (uint16x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_s16 (int16_t * __a, int16x4x2_t val)
+ {
+- return (uint32x4_t) __builtin_aarch64_ushll2_nv8hi ((int16x8_t) __a, __b);
++ __builtin_aarch64_simd_oi __o;
++ int16x8x2_t temp;
++ temp.val[0] = vcombine_s16 (val.val[0], vcreate_s16 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s16 (val.val[1], vcreate_s16 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[1], 1);
++ __builtin_aarch64_st2v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vshll_high_n_u32 (uint32x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_p16 (poly16_t * __a, poly16x4x2_t val)
+ {
+- return (uint64x2_t) __builtin_aarch64_ushll2_nv4si ((int32x4_t) __a, __b);
++ __builtin_aarch64_simd_oi __o;
++ poly16x8x2_t temp;
++ temp.val[0] = vcombine_p16 (val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_p16 (val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[1], 1);
++ __builtin_aarch64_st2v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vshll_n_s8 (int8x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_s32 (int32_t * __a, int32x2x2_t val)
+ {
+- return __builtin_aarch64_sshll_nv8qi (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ int32x4x2_t temp;
++ temp.val[0] = vcombine_s32 (val.val[0], vcreate_s32 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s32 (val.val[1], vcreate_s32 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[1], 1);
++ __builtin_aarch64_st2v2si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vshll_n_s16 (int16x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_u8 (uint8_t * __a, uint8x8x2_t val)
+ {
+- return __builtin_aarch64_sshll_nv4hi (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ uint8x16x2_t temp;
++ temp.val[0] = vcombine_u8 (val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u8 (val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[1], 1);
++ __builtin_aarch64_st2v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vshll_n_s32 (int32x2_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_u16 (uint16_t * __a, uint16x4x2_t val)
+ {
+- return __builtin_aarch64_sshll_nv2si (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ uint16x8x2_t temp;
++ temp.val[0] = vcombine_u16 (val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u16 (val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[1], 1);
++ __builtin_aarch64_st2v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vshll_n_u8 (uint8x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_u32 (uint32_t * __a, uint32x2x2_t val)
+ {
+- return __builtin_aarch64_ushll_nv8qi_uus (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ uint32x4x2_t temp;
++ temp.val[0] = vcombine_u32 (val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u32 (val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[1], 1);
++ __builtin_aarch64_st2v2si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vshll_n_u16 (uint16x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_f16 (float16_t * __a, float16x4x2_t val)
+ {
+- return __builtin_aarch64_ushll_nv4hi_uus (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ float16x8x2_t temp;
++ temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv8hf (__o, temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv8hf (__o, temp.val[1], 1);
++ __builtin_aarch64_st2v4hf (__a, __o);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vshll_n_u32 (uint32x2_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2_f32 (float32_t * __a, float32x2x2_t val)
+ {
+- return __builtin_aarch64_ushll_nv2si_uus (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ float32x4x2_t temp;
++ temp.val[0] = vcombine_f32 (val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_f32 (val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) temp.val[1], 1);
++ __builtin_aarch64_st2v2sf ((__builtin_aarch64_simd_sf *) __a, __o);
+ }
+
+-/* vshr */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vshr_n_s8 (int8x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_s8 (int8_t * __a, int8x16x2_t val)
+ {
+- return (int8x8_t) __builtin_aarch64_ashrv8qi (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1);
++ __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vshr_n_s16 (int16x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_p8 (poly8_t * __a, poly8x16x2_t val)
+ {
+- return (int16x4_t) __builtin_aarch64_ashrv4hi (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1);
++ __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vshr_n_s32 (int32x2_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_s16 (int16_t * __a, int16x8x2_t val)
+ {
+- return (int32x2_t) __builtin_aarch64_ashrv2si (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1);
++ __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vshr_n_s64 (int64x1_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_p16 (poly16_t * __a, poly16x8x2_t val)
+ {
+- return (int64x1_t) {__builtin_aarch64_ashr_simddi (__a[0], __b)};
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1);
++ __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vshr_n_u8 (uint8x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_s32 (int32_t * __a, int32x4x2_t val)
+ {
+- return (uint8x8_t) __builtin_aarch64_lshrv8qi ((int8x8_t) __a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[1], 1);
++ __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vshr_n_u16 (uint16x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_s64 (int64_t * __a, int64x2x2_t val)
+ {
+- return (uint16x4_t) __builtin_aarch64_lshrv4hi ((int16x4_t) __a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[1], 1);
++ __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vshr_n_u32 (uint32x2_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_u8 (uint8_t * __a, uint8x16x2_t val)
+ {
+- return (uint32x2_t) __builtin_aarch64_lshrv2si ((int32x2_t) __a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1);
++ __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vshr_n_u64 (uint64x1_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_u16 (uint16_t * __a, uint16x8x2_t val)
+ {
+- return (uint64x1_t) {__builtin_aarch64_lshr_simddi_uus ( __a[0], __b)};
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1);
++ __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vshrq_n_s8 (int8x16_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_u32 (uint32_t * __a, uint32x4x2_t val)
+ {
+- return (int8x16_t) __builtin_aarch64_ashrv16qi (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[1], 1);
++ __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vshrq_n_s16 (int16x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_u64 (uint64_t * __a, uint64x2x2_t val)
+ {
+- return (int16x8_t) __builtin_aarch64_ashrv8hi (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[1], 1);
++ __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vshrq_n_s32 (int32x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_f16 (float16_t * __a, float16x8x2_t val)
+ {
+- return (int32x4_t) __builtin_aarch64_ashrv4si (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv8hf (__o, val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv8hf (__o, val.val[1], 1);
++ __builtin_aarch64_st2v8hf (__a, __o);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vshrq_n_s64 (int64x2_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_f32 (float32_t * __a, float32x4x2_t val)
+ {
+- return (int64x2_t) __builtin_aarch64_ashrv2di (__a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) val.val[1], 1);
++ __builtin_aarch64_st2v4sf ((__builtin_aarch64_simd_sf *) __a, __o);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vshrq_n_u8 (uint8x16_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst2q_f64 (float64_t * __a, float64x2x2_t val)
+ {
+- return (uint8x16_t) __builtin_aarch64_lshrv16qi ((int8x16_t) __a, __b);
++ __builtin_aarch64_simd_oi __o;
++ __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) val.val[1], 1);
++ __builtin_aarch64_st2v2df ((__builtin_aarch64_simd_df *) __a, __o);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vshrq_n_u16 (uint16x8_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_s64 (int64_t * __a, int64x1x3_t val)
+ {
+- return (uint16x8_t) __builtin_aarch64_lshrv8hi ((int16x8_t) __a, __b);
++ __builtin_aarch64_simd_ci __o;
++ int64x2x3_t temp;
++ temp.val[0] = vcombine_s64 (val.val[0], vcreate_s64 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s64 (val.val[1], vcreate_s64 (__AARCH64_INT64_C (0)));
++ temp.val[2] = vcombine_s64 (val.val[2], vcreate_s64 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[2], 2);
++ __builtin_aarch64_st3di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vshrq_n_u32 (uint32x4_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_u64 (uint64_t * __a, uint64x1x3_t val)
+ {
+- return (uint32x4_t) __builtin_aarch64_lshrv4si ((int32x4_t) __a, __b);
++ __builtin_aarch64_simd_ci __o;
++ uint64x2x3_t temp;
++ temp.val[0] = vcombine_u64 (val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u64 (val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_u64 (val.val[2], vcreate_u64 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[2], 2);
++ __builtin_aarch64_st3di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vshrq_n_u64 (uint64x2_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_f64 (float64_t * __a, float64x1x3_t val)
+ {
+- return (uint64x2_t) __builtin_aarch64_lshrv2di ((int64x2_t) __a, __b);
++ __builtin_aarch64_simd_ci __o;
++ float64x2x3_t temp;
++ temp.val[0] = vcombine_f64 (val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_f64 (val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_f64 (val.val[2], vcreate_f64 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) temp.val[2], 2);
++ __builtin_aarch64_st3df ((__builtin_aarch64_simd_df *) __a, __o);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vshrd_n_s64 (int64_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_s8 (int8_t * __a, int8x8x3_t val)
+ {
+- return __builtin_aarch64_ashr_simddi (__a, __b);
++ __builtin_aarch64_simd_ci __o;
++ int8x16x3_t temp;
++ temp.val[0] = vcombine_s8 (val.val[0], vcreate_s8 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s8 (val.val[1], vcreate_s8 (__AARCH64_INT64_C (0)));
++ temp.val[2] = vcombine_s8 (val.val[2], vcreate_s8 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[2], 2);
++ __builtin_aarch64_st3v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vshrd_n_u64 (uint64_t __a, const int __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_p8 (poly8_t * __a, poly8x8x3_t val)
+ {
+- return __builtin_aarch64_lshr_simddi_uus (__a, __b);
++ __builtin_aarch64_simd_ci __o;
++ poly8x16x3_t temp;
++ temp.val[0] = vcombine_p8 (val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_p8 (val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_p8 (val.val[2], vcreate_p8 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[2], 2);
++ __builtin_aarch64_st3v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-/* vsli */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vsli_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_s16 (int16_t * __a, int16x4x3_t val)
+ {
+- return (int8x8_t) __builtin_aarch64_ssli_nv8qi (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ int16x8x3_t temp;
++ temp.val[0] = vcombine_s16 (val.val[0], vcreate_s16 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s16 (val.val[1], vcreate_s16 (__AARCH64_INT64_C (0)));
++ temp.val[2] = vcombine_s16 (val.val[2], vcreate_s16 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[2], 2);
++ __builtin_aarch64_st3v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vsli_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_p16 (poly16_t * __a, poly16x4x3_t val)
+ {
+- return (int16x4_t) __builtin_aarch64_ssli_nv4hi (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ poly16x8x3_t temp;
++ temp.val[0] = vcombine_p16 (val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_p16 (val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_p16 (val.val[2], vcreate_p16 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[2], 2);
++ __builtin_aarch64_st3v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vsli_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_s32 (int32_t * __a, int32x2x3_t val)
+ {
+- return (int32x2_t) __builtin_aarch64_ssli_nv2si (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ int32x4x3_t temp;
++ temp.val[0] = vcombine_s32 (val.val[0], vcreate_s32 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s32 (val.val[1], vcreate_s32 (__AARCH64_INT64_C (0)));
++ temp.val[2] = vcombine_s32 (val.val[2], vcreate_s32 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[2], 2);
++ __builtin_aarch64_st3v2si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vsli_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_u8 (uint8_t * __a, uint8x8x3_t val)
+ {
+- return (int64x1_t) {__builtin_aarch64_ssli_ndi (__a[0], __b[0], __c)};
++ __builtin_aarch64_simd_ci __o;
++ uint8x16x3_t temp;
++ temp.val[0] = vcombine_u8 (val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u8 (val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_u8 (val.val[2], vcreate_u8 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[2], 2);
++ __builtin_aarch64_st3v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vsli_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_u16 (uint16_t * __a, uint16x4x3_t val)
+ {
+- return __builtin_aarch64_usli_nv8qi_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ uint16x8x3_t temp;
++ temp.val[0] = vcombine_u16 (val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u16 (val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_u16 (val.val[2], vcreate_u16 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[2], 2);
++ __builtin_aarch64_st3v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vsli_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_u32 (uint32_t * __a, uint32x2x3_t val)
+ {
+- return __builtin_aarch64_usli_nv4hi_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ uint32x4x3_t temp;
++ temp.val[0] = vcombine_u32 (val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u32 (val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_u32 (val.val[2], vcreate_u32 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[2], 2);
++ __builtin_aarch64_st3v2si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vsli_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_f16 (float16_t * __a, float16x4x3_t val)
+ {
+- return __builtin_aarch64_usli_nv2si_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ float16x8x3_t temp;
++ temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_f16 (val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[2], 2);
++ __builtin_aarch64_st3v4hf ((__builtin_aarch64_simd_hf *) __a, __o);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vsli_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3_f32 (float32_t * __a, float32x2x3_t val)
+ {
+- return (uint64x1_t) {__builtin_aarch64_usli_ndi_uuus (__a[0], __b[0], __c)};
++ __builtin_aarch64_simd_ci __o;
++ float32x4x3_t temp;
++ temp.val[0] = vcombine_f32 (val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_f32 (val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_f32 (val.val[2], vcreate_f32 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) temp.val[2], 2);
++ __builtin_aarch64_st3v2sf ((__builtin_aarch64_simd_sf *) __a, __o);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vsliq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_s8 (int8_t * __a, int8x16x3_t val)
+ {
+- return (int8x16_t) __builtin_aarch64_ssli_nv16qi (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2);
++ __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vsliq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_p8 (poly8_t * __a, poly8x16x3_t val)
+ {
+- return (int16x8_t) __builtin_aarch64_ssli_nv8hi (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2);
++ __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vsliq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_s16 (int16_t * __a, int16x8x3_t val)
+ {
+- return (int32x4_t) __builtin_aarch64_ssli_nv4si (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2);
++ __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vsliq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_p16 (poly16_t * __a, poly16x8x3_t val)
+ {
+- return (int64x2_t) __builtin_aarch64_ssli_nv2di (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2);
++ __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vsliq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_s32 (int32_t * __a, int32x4x3_t val)
+ {
+- return __builtin_aarch64_usli_nv16qi_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[2], 2);
++ __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vsliq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_s64 (int64_t * __a, int64x2x3_t val)
+ {
+- return __builtin_aarch64_usli_nv8hi_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[2], 2);
++ __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsliq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_u8 (uint8_t * __a, uint8x16x3_t val)
+ {
+- return __builtin_aarch64_usli_nv4si_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2);
++ __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vsliq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_u16 (uint16_t * __a, uint16x8x3_t val)
+ {
+- return __builtin_aarch64_usli_nv2di_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2);
++ __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vslid_n_s64 (int64_t __a, int64_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_u32 (uint32_t * __a, uint32x4x3_t val)
+ {
+- return __builtin_aarch64_ssli_ndi (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[2], 2);
++ __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vslid_n_u64 (uint64_t __a, uint64_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_u64 (uint64_t * __a, uint64x2x3_t val)
+ {
+- return __builtin_aarch64_usli_ndi_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[2], 2);
++ __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-/* vsqadd */
+-
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vsqadd_u8 (uint8x8_t __a, int8x8_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_f16 (float16_t * __a, float16x8x3_t val)
+ {
+- return __builtin_aarch64_usqaddv8qi_uus (__a, __b);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[2], 2);
++ __builtin_aarch64_st3v8hf ((__builtin_aarch64_simd_hf *) __a, __o);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vsqadd_u16 (uint16x4_t __a, int16x4_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_f32 (float32_t * __a, float32x4x3_t val)
+ {
+- return __builtin_aarch64_usqaddv4hi_uus (__a, __b);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[2], 2);
++ __builtin_aarch64_st3v4sf ((__builtin_aarch64_simd_sf *) __a, __o);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vsqadd_u32 (uint32x2_t __a, int32x2_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst3q_f64 (float64_t * __a, float64x2x3_t val)
+ {
+- return __builtin_aarch64_usqaddv2si_uus (__a, __b);
++ __builtin_aarch64_simd_ci __o;
++ __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[2], 2);
++ __builtin_aarch64_st3v2df ((__builtin_aarch64_simd_df *) __a, __o);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vsqadd_u64 (uint64x1_t __a, int64x1_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_s64 (int64_t * __a, int64x1x4_t val)
+ {
+- return (uint64x1_t) {__builtin_aarch64_usqadddi_uus (__a[0], __b[0])};
++ __builtin_aarch64_simd_xi __o;
++ int64x2x4_t temp;
++ temp.val[0] = vcombine_s64 (val.val[0], vcreate_s64 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s64 (val.val[1], vcreate_s64 (__AARCH64_INT64_C (0)));
++ temp.val[2] = vcombine_s64 (val.val[2], vcreate_s64 (__AARCH64_INT64_C (0)));
++ temp.val[3] = vcombine_s64 (val.val[3], vcreate_s64 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[3], 3);
++ __builtin_aarch64_st4di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vsqaddq_u8 (uint8x16_t __a, int8x16_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_u64 (uint64_t * __a, uint64x1x4_t val)
+ {
+- return __builtin_aarch64_usqaddv16qi_uus (__a, __b);
++ __builtin_aarch64_simd_xi __o;
++ uint64x2x4_t temp;
++ temp.val[0] = vcombine_u64 (val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u64 (val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_u64 (val.val[2], vcreate_u64 (__AARCH64_UINT64_C (0)));
++ temp.val[3] = vcombine_u64 (val.val[3], vcreate_u64 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[3], 3);
++ __builtin_aarch64_st4di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vsqaddq_u16 (uint16x8_t __a, int16x8_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_f64 (float64_t * __a, float64x1x4_t val)
+ {
+- return __builtin_aarch64_usqaddv8hi_uus (__a, __b);
++ __builtin_aarch64_simd_xi __o;
++ float64x2x4_t temp;
++ temp.val[0] = vcombine_f64 (val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_f64 (val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_f64 (val.val[2], vcreate_f64 (__AARCH64_UINT64_C (0)));
++ temp.val[3] = vcombine_f64 (val.val[3], vcreate_f64 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) temp.val[3], 3);
++ __builtin_aarch64_st4df ((__builtin_aarch64_simd_df *) __a, __o);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsqaddq_u32 (uint32x4_t __a, int32x4_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_s8 (int8_t * __a, int8x8x4_t val)
+ {
+- return __builtin_aarch64_usqaddv4si_uus (__a, __b);
++ __builtin_aarch64_simd_xi __o;
++ int8x16x4_t temp;
++ temp.val[0] = vcombine_s8 (val.val[0], vcreate_s8 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s8 (val.val[1], vcreate_s8 (__AARCH64_INT64_C (0)));
++ temp.val[2] = vcombine_s8 (val.val[2], vcreate_s8 (__AARCH64_INT64_C (0)));
++ temp.val[3] = vcombine_s8 (val.val[3], vcreate_s8 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[3], 3);
++ __builtin_aarch64_st4v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vsqaddq_u64 (uint64x2_t __a, int64x2_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_p8 (poly8_t * __a, poly8x8x4_t val)
+ {
+- return __builtin_aarch64_usqaddv2di_uus (__a, __b);
++ __builtin_aarch64_simd_xi __o;
++ poly8x16x4_t temp;
++ temp.val[0] = vcombine_p8 (val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_p8 (val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_p8 (val.val[2], vcreate_p8 (__AARCH64_UINT64_C (0)));
++ temp.val[3] = vcombine_p8 (val.val[3], vcreate_p8 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[3], 3);
++ __builtin_aarch64_st4v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline uint8_t __attribute__ ((__always_inline__))
+-vsqaddb_u8 (uint8_t __a, int8_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_s16 (int16_t * __a, int16x4x4_t val)
+ {
+- return __builtin_aarch64_usqaddqi_uus (__a, __b);
++ __builtin_aarch64_simd_xi __o;
++ int16x8x4_t temp;
++ temp.val[0] = vcombine_s16 (val.val[0], vcreate_s16 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s16 (val.val[1], vcreate_s16 (__AARCH64_INT64_C (0)));
++ temp.val[2] = vcombine_s16 (val.val[2], vcreate_s16 (__AARCH64_INT64_C (0)));
++ temp.val[3] = vcombine_s16 (val.val[3], vcreate_s16 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[3], 3);
++ __builtin_aarch64_st4v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline uint16_t __attribute__ ((__always_inline__))
+-vsqaddh_u16 (uint16_t __a, int16_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_p16 (poly16_t * __a, poly16x4x4_t val)
+ {
+- return __builtin_aarch64_usqaddhi_uus (__a, __b);
++ __builtin_aarch64_simd_xi __o;
++ poly16x8x4_t temp;
++ temp.val[0] = vcombine_p16 (val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_p16 (val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_p16 (val.val[2], vcreate_p16 (__AARCH64_UINT64_C (0)));
++ temp.val[3] = vcombine_p16 (val.val[3], vcreate_p16 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[3], 3);
++ __builtin_aarch64_st4v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vsqadds_u32 (uint32_t __a, int32_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_s32 (int32_t * __a, int32x2x4_t val)
+ {
+- return __builtin_aarch64_usqaddsi_uus (__a, __b);
++ __builtin_aarch64_simd_xi __o;
++ int32x4x4_t temp;
++ temp.val[0] = vcombine_s32 (val.val[0], vcreate_s32 (__AARCH64_INT64_C (0)));
++ temp.val[1] = vcombine_s32 (val.val[1], vcreate_s32 (__AARCH64_INT64_C (0)));
++ temp.val[2] = vcombine_s32 (val.val[2], vcreate_s32 (__AARCH64_INT64_C (0)));
++ temp.val[3] = vcombine_s32 (val.val[3], vcreate_s32 (__AARCH64_INT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[3], 3);
++ __builtin_aarch64_st4v2si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vsqaddd_u64 (uint64_t __a, int64_t __b)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_u8 (uint8_t * __a, uint8x8x4_t val)
+ {
+- return __builtin_aarch64_usqadddi_uus (__a, __b);
++ __builtin_aarch64_simd_xi __o;
++ uint8x16x4_t temp;
++ temp.val[0] = vcombine_u8 (val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u8 (val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_u8 (val.val[2], vcreate_u8 (__AARCH64_UINT64_C (0)));
++ temp.val[3] = vcombine_u8 (val.val[3], vcreate_u8 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[3], 3);
++ __builtin_aarch64_st4v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-/* vsqrt */
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vsqrt_f32 (float32x2_t a)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_u16 (uint16_t * __a, uint16x4x4_t val)
+ {
+- return __builtin_aarch64_sqrtv2sf (a);
++ __builtin_aarch64_simd_xi __o;
++ uint16x8x4_t temp;
++ temp.val[0] = vcombine_u16 (val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u16 (val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_u16 (val.val[2], vcreate_u16 (__AARCH64_UINT64_C (0)));
++ temp.val[3] = vcombine_u16 (val.val[3], vcreate_u16 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[3], 3);
++ __builtin_aarch64_st4v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vsqrtq_f32 (float32x4_t a)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_u32 (uint32_t * __a, uint32x2x4_t val)
+ {
+- return __builtin_aarch64_sqrtv4sf (a);
++ __builtin_aarch64_simd_xi __o;
++ uint32x4x4_t temp;
++ temp.val[0] = vcombine_u32 (val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_u32 (val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_u32 (val.val[2], vcreate_u32 (__AARCH64_UINT64_C (0)));
++ temp.val[3] = vcombine_u32 (val.val[3], vcreate_u32 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[3], 3);
++ __builtin_aarch64_st4v2si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+-vsqrt_f64 (float64x1_t a)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_f16 (float16_t * __a, float16x4x4_t val)
+ {
+- return (float64x1_t) { __builtin_aarch64_sqrtdf (a[0]) };
++ __builtin_aarch64_simd_xi __o;
++ float16x8x4_t temp;
++ temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_f16 (val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0)));
++ temp.val[3] = vcombine_f16 (val.val[3], vcreate_f16 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[3], 3);
++ __builtin_aarch64_st4v4hf ((__builtin_aarch64_simd_hf *) __a, __o);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vsqrtq_f64 (float64x2_t a)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4_f32 (float32_t * __a, float32x2x4_t val)
+ {
+- return __builtin_aarch64_sqrtv2df (a);
++ __builtin_aarch64_simd_xi __o;
++ float32x4x4_t temp;
++ temp.val[0] = vcombine_f32 (val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0)));
++ temp.val[1] = vcombine_f32 (val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0)));
++ temp.val[2] = vcombine_f32 (val.val[2], vcreate_f32 (__AARCH64_UINT64_C (0)));
++ temp.val[3] = vcombine_f32 (val.val[3], vcreate_f32 (__AARCH64_UINT64_C (0)));
++ __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) temp.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) temp.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) temp.val[3], 3);
++ __builtin_aarch64_st4v2sf ((__builtin_aarch64_simd_sf *) __a, __o);
+ }
+
+-/* vsra */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vsra_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_s8 (int8_t * __a, int8x16x4_t val)
+ {
+- return (int8x8_t) __builtin_aarch64_ssra_nv8qi (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[3], 3);
++ __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vsra_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_p8 (poly8_t * __a, poly8x16x4_t val)
+ {
+- return (int16x4_t) __builtin_aarch64_ssra_nv4hi (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[3], 3);
++ __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vsra_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_s16 (int16_t * __a, int16x8x4_t val)
+ {
+- return (int32x2_t) __builtin_aarch64_ssra_nv2si (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[3], 3);
++ __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vsra_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_p16 (poly16_t * __a, poly16x8x4_t val)
+ {
+- return (int64x1_t) {__builtin_aarch64_ssra_ndi (__a[0], __b[0], __c)};
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[3], 3);
++ __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vsra_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_s32 (int32_t * __a, int32x4x4_t val)
+ {
+- return __builtin_aarch64_usra_nv8qi_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[3], 3);
++ __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vsra_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_s64 (int64_t * __a, int64x2x4_t val)
+ {
+- return __builtin_aarch64_usra_nv4hi_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[3], 3);
++ __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vsra_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_u8 (uint8_t * __a, uint8x16x4_t val)
+ {
+- return __builtin_aarch64_usra_nv2si_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[3], 3);
++ __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vsra_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_u16 (uint16_t * __a, uint16x8x4_t val)
+ {
+- return (uint64x1_t) {__builtin_aarch64_usra_ndi_uuus (__a[0], __b[0], __c)};
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[3], 3);
++ __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vsraq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_u32 (uint32_t * __a, uint32x4x4_t val)
+ {
+- return (int8x16_t) __builtin_aarch64_ssra_nv16qi (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[3], 3);
++ __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __o);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vsraq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_u64 (uint64_t * __a, uint64x2x4_t val)
+ {
+- return (int16x8_t) __builtin_aarch64_ssra_nv8hi (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[3], 3);
++ __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __o);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vsraq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_f16 (float16_t * __a, float16x8x4_t val)
+ {
+- return (int32x4_t) __builtin_aarch64_ssra_nv4si (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[3], 3);
++ __builtin_aarch64_st4v8hf ((__builtin_aarch64_simd_hf *) __a, __o);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vsraq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_f32 (float32_t * __a, float32x4x4_t val)
+ {
+- return (int64x2_t) __builtin_aarch64_ssra_nv2di (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[3], 3);
++ __builtin_aarch64_st4v4sf ((__builtin_aarch64_simd_sf *) __a, __o);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vsraq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
++__extension__ extern __inline void
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vst4q_f64 (float64_t * __a, float64x2x4_t val)
+ {
+- return __builtin_aarch64_usra_nv16qi_uuus (__a, __b, __c);
++ __builtin_aarch64_simd_xi __o;
++ __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[0], 0);
++ __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[1], 1);
++ __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[2], 2);
++ __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[3], 3);
++ __builtin_aarch64_st4v2df ((__builtin_aarch64_simd_df *) __a, __o);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vsraq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
++/* vsub */
++
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsubd_s64 (int64_t __a, int64_t __b)
+ {
+- return __builtin_aarch64_usra_nv8hi_uuus (__a, __b, __c);
++ return __a - __b;
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsraq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsubd_u64 (uint64_t __a, uint64_t __b)
+ {
+- return __builtin_aarch64_usra_nv4si_uuus (__a, __b, __c);
++ return __a - __b;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vsraq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
++/* vtbx1 */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx1_s8 (int8x8_t __r, int8x8_t __tab, int8x8_t __idx)
+ {
+- return __builtin_aarch64_usra_nv2di_uuus (__a, __b, __c);
++ uint8x8_t __mask = vclt_u8 (vreinterpret_u8_s8 (__idx),
++ vmov_n_u8 (8));
++ int8x8_t __tbl = vtbl1_s8 (__tab, __idx);
++
++ return vbsl_s8 (__mask, __tbl, __r);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vsrad_n_s64 (int64_t __a, int64_t __b, const int __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx1_u8 (uint8x8_t __r, uint8x8_t __tab, uint8x8_t __idx)
+ {
+- return __builtin_aarch64_ssra_ndi (__a, __b, __c);
++ uint8x8_t __mask = vclt_u8 (__idx, vmov_n_u8 (8));
++ uint8x8_t __tbl = vtbl1_u8 (__tab, __idx);
++
++ return vbsl_u8 (__mask, __tbl, __r);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vsrad_n_u64 (uint64_t __a, uint64_t __b, const int __c)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx1_p8 (poly8x8_t __r, poly8x8_t __tab, uint8x8_t __idx)
+ {
+- return __builtin_aarch64_usra_ndi_uuus (__a, __b, __c);
++ uint8x8_t __mask = vclt_u8 (__idx, vmov_n_u8 (8));
++ poly8x8_t __tbl = vtbl1_p8 (__tab, __idx);
++
++ return vbsl_p8 (__mask, __tbl, __r);
+ }
+
+-/* vsri */
++/* vtbx3 */
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vsri_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx3_s8 (int8x8_t __r, int8x8x3_t __tab, int8x8_t __idx)
+ {
+- return (int8x8_t) __builtin_aarch64_ssri_nv8qi (__a, __b, __c);
++ uint8x8_t __mask = vclt_u8 (vreinterpret_u8_s8 (__idx),
++ vmov_n_u8 (24));
++ int8x8_t __tbl = vtbl3_s8 (__tab, __idx);
++
++ return vbsl_s8 (__mask, __tbl, __r);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vsri_n_s16 (int16x4_t __a, int16x4_t __b, const int __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx3_u8 (uint8x8_t __r, uint8x8x3_t __tab, uint8x8_t __idx)
+ {
+- return (int16x4_t) __builtin_aarch64_ssri_nv4hi (__a, __b, __c);
++ uint8x8_t __mask = vclt_u8 (__idx, vmov_n_u8 (24));
++ uint8x8_t __tbl = vtbl3_u8 (__tab, __idx);
++
++ return vbsl_u8 (__mask, __tbl, __r);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vsri_n_s32 (int32x2_t __a, int32x2_t __b, const int __c)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx3_p8 (poly8x8_t __r, poly8x8x3_t __tab, uint8x8_t __idx)
+ {
+- return (int32x2_t) __builtin_aarch64_ssri_nv2si (__a, __b, __c);
++ uint8x8_t __mask = vclt_u8 (__idx, vmov_n_u8 (24));
++ poly8x8_t __tbl = vtbl3_p8 (__tab, __idx);
++
++ return vbsl_p8 (__mask, __tbl, __r);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vsri_n_s64 (int64x1_t __a, int64x1_t __b, const int __c)
++/* vtbx4 */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx4_s8 (int8x8_t __r, int8x8x4_t __tab, int8x8_t __idx)
+ {
+- return (int64x1_t) {__builtin_aarch64_ssri_ndi (__a[0], __b[0], __c)};
++ int8x8_t result;
++ int8x16x2_t temp;
++ __builtin_aarch64_simd_oi __o;
++ temp.val[0] = vcombine_s8 (__tab.val[0], __tab.val[1]);
++ temp.val[1] = vcombine_s8 (__tab.val[2], __tab.val[3]);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[1], 1);
++ result = __builtin_aarch64_tbx4v8qi (__r, __o, __idx);
++ return result;
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vsri_n_u8 (uint8x8_t __a, uint8x8_t __b, const int __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx4_u8 (uint8x8_t __r, uint8x8x4_t __tab, uint8x8_t __idx)
+ {
+- return __builtin_aarch64_usri_nv8qi_uuus (__a, __b, __c);
++ uint8x8_t result;
++ uint8x16x2_t temp;
++ __builtin_aarch64_simd_oi __o;
++ temp.val[0] = vcombine_u8 (__tab.val[0], __tab.val[1]);
++ temp.val[1] = vcombine_u8 (__tab.val[2], __tab.val[3]);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[1], 1);
++ result = (uint8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)__r, __o,
++ (int8x8_t)__idx);
++ return result;
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vsri_n_u16 (uint16x4_t __a, uint16x4_t __b, const int __c)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtbx4_p8 (poly8x8_t __r, poly8x8x4_t __tab, uint8x8_t __idx)
+ {
+- return __builtin_aarch64_usri_nv4hi_uuus (__a, __b, __c);
++ poly8x8_t result;
++ poly8x16x2_t temp;
++ __builtin_aarch64_simd_oi __o;
++ temp.val[0] = vcombine_p8 (__tab.val[0], __tab.val[1]);
++ temp.val[1] = vcombine_p8 (__tab.val[2], __tab.val[3]);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[0], 0);
++ __o = __builtin_aarch64_set_qregoiv16qi (__o,
++ (int8x16_t) temp.val[1], 1);
++ result = (poly8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)__r, __o,
++ (int8x8_t)__idx);
++ return result;
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vsri_n_u32 (uint32x2_t __a, uint32x2_t __b, const int __c)
++/* vtrn */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- return __builtin_aarch64_usri_nv2si_uuus (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 1, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 2, 6});
++#endif
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vsri_n_u64 (uint64x1_t __a, uint64x1_t __b, const int __c)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- return (uint64x1_t) {__builtin_aarch64_usri_ndi_uuus (__a[0], __b[0], __c)};
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vsriq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1_p8 (poly8x8_t __a, poly8x8_t __b)
+ {
+- return (int8x16_t) __builtin_aarch64_ssri_nv16qi (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++#endif
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vsriq_n_s16 (int16x8_t __a, int16x8_t __b, const int __c)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1_p16 (poly16x4_t __a, poly16x4_t __b)
+ {
+- return (int16x8_t) __builtin_aarch64_ssri_nv8hi (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 1, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 2, 6});
++#endif
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vsriq_n_s32 (int32x4_t __a, int32x4_t __b, const int __c)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- return (int32x4_t) __builtin_aarch64_ssri_nv4si (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++#endif
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vsriq_n_s64 (int64x2_t __a, int64x2_t __b, const int __c)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- return (int64x2_t) __builtin_aarch64_ssri_nv2di (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 1, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 2, 6});
++#endif
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vsriq_n_u8 (uint8x16_t __a, uint8x16_t __b, const int __c)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- return __builtin_aarch64_usri_nv16qi_uuus (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vsriq_n_u16 (uint16x8_t __a, uint16x8_t __b, const int __c)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- return __builtin_aarch64_usri_nv8hi_uuus (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++#endif
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsriq_n_u32 (uint32x4_t __a, uint32x4_t __b, const int __c)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+- return __builtin_aarch64_usri_nv4si_uuus (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 1, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 2, 6});
++#endif
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vsriq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- return __builtin_aarch64_usri_nv2di_uuus (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vsrid_n_s64 (int64_t __a, int64_t __b, const int __c)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- return __builtin_aarch64_ssri_ndi (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++#endif
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vsrid_n_u64 (uint64_t __a, uint64_t __b, const int __c)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- return __builtin_aarch64_usri_ndi_uuus (__a, __b, __c);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 1, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 2, 6});
++#endif
+ }
+
+-/* vst1 */
+-
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_f16 (float16_t *__a, float16x4_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- __builtin_aarch64_st1v4hf (__a, __b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_f32 (float32_t *a, float32x2_t b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_p8 (poly8x16_t __a, poly8x16_t __b)
+ {
+- __builtin_aarch64_st1v2sf ((__builtin_aarch64_simd_sf *) a, b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {0, 16, 2, 18, 4, 20, 6, 22, 8, 24, 10, 26, 12, 28, 14, 30});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_f64 (float64_t *a, float64x1_t b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_p16 (poly16x8_t __a, poly16x8_t __b)
+ {
+- *a = b[0];
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_p8 (poly8_t *a, poly8x8_t b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- __builtin_aarch64_st1v8qi ((__builtin_aarch64_simd_qi *) a,
+- (int8x8_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {0, 16, 2, 18, 4, 20, 6, 22, 8, 24, 10, 26, 12, 28, 14, 30});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_p16 (poly16_t *a, poly16x4_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- __builtin_aarch64_st1v4hi ((__builtin_aarch64_simd_hi *) a,
+- (int16x4_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_s8 (int8_t *a, int8x8_t b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- __builtin_aarch64_st1v8qi ((__builtin_aarch64_simd_qi *) a, b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 1, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 2, 6});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_s16 (int16_t *a, int16x4_t b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- __builtin_aarch64_st1v4hi ((__builtin_aarch64_simd_hi *) a, b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_s32 (int32_t *a, int32x2_t b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+- __builtin_aarch64_st1v2si ((__builtin_aarch64_simd_si *) a, b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {0, 16, 2, 18, 4, 20, 6, 22, 8, 24, 10, 26, 12, 28, 14, 30});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_s64 (int64_t *a, int64x1_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+- *a = b[0];
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_u8 (uint8_t *a, uint8x8_t b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+- __builtin_aarch64_st1v8qi ((__builtin_aarch64_simd_qi *) a,
+- (int8x8_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 1, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 2, 6});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_u16 (uint16_t *a, uint16x4_t b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn1q_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+- __builtin_aarch64_st1v4hi ((__builtin_aarch64_simd_hi *) a,
+- (int16x4_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_u32 (uint32_t *a, uint32x2_t b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- __builtin_aarch64_st1v2si ((__builtin_aarch64_simd_si *) a,
+- (int32x2_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 6, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 5, 3, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_u64 (uint64_t *a, uint64x1_t b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- *a = b[0];
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
++#endif
+ }
+
+-/* vst1q */
+-
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_f16 (float16_t *__a, float16x8_t __b)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2_p8 (poly8x8_t __a, poly8x8_t __b)
+ {
+- __builtin_aarch64_st1v8hf (__a, __b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_f32 (float32_t *a, float32x4_t b)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2_p16 (poly16x4_t __a, poly16x4_t __b)
+ {
+- __builtin_aarch64_st1v4sf ((__builtin_aarch64_simd_sf *) a, b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 6, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 5, 3, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_f64 (float64_t *a, float64x2_t b)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- __builtin_aarch64_st1v2df ((__builtin_aarch64_simd_df *) a, b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_p8 (poly8_t *a, poly8x16_t b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- __builtin_aarch64_st1v16qi ((__builtin_aarch64_simd_qi *) a,
+- (int8x16_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 6, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 5, 3, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_p16 (poly16_t *a, poly16x8_t b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- __builtin_aarch64_st1v8hi ((__builtin_aarch64_simd_hi *) a,
+- (int16x8_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_s8 (int8_t *a, int8x16_t b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- __builtin_aarch64_st1v16qi ((__builtin_aarch64_simd_qi *) a, b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_s16 (int16_t *a, int16x8_t b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+- __builtin_aarch64_st1v8hi ((__builtin_aarch64_simd_hi *) a, b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 6, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 5, 3, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_s32 (int32_t *a, int32x4_t b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- __builtin_aarch64_st1v4si ((__builtin_aarch64_simd_si *) a, b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_s64 (int64_t *a, int64x2_t b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- __builtin_aarch64_st1v2di ((__builtin_aarch64_simd_di *) a, b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_u8 (uint8_t *a, uint8x16_t b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- __builtin_aarch64_st1v16qi ((__builtin_aarch64_simd_qi *) a,
+- (int8x16_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 6, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 5, 3, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_u16 (uint16_t *a, uint16x8_t b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- __builtin_aarch64_st1v8hi ((__builtin_aarch64_simd_hi *) a,
+- (int16x8_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_u32 (uint32_t *a, uint32x4_t b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_p8 (poly8x16_t __a, poly8x16_t __b)
+ {
+- __builtin_aarch64_st1v4si ((__builtin_aarch64_simd_si *) a,
+- (int32x4_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {16, 0, 18, 2, 20, 4, 22, 6, 24, 8, 26, 10, 28, 12, 30, 14});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {1, 17, 3, 19, 5, 21, 7, 23, 9, 25, 11, 27, 13, 29, 15, 31});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_u64 (uint64_t *a, uint64x2_t b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_p16 (poly16x8_t __a, poly16x8_t __b)
+ {
+- __builtin_aarch64_st1v2di ((__builtin_aarch64_simd_di *) a,
+- (int64x2_t) b);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
++#endif
+ }
+
+-/* vst1_lane */
+-
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_f16 (float16_t *__a, float16x4_t __b, const int __lane)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {16, 0, 18, 2, 20, 4, 22, 6, 24, 8, 26, 10, 28, 12, 30, 14});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {1, 17, 3, 19, 5, 21, 7, 23, 9, 25, 11, 27, 13, 29, 15, 31});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_f32 (float32_t *__a, float32x2_t __b, const int __lane)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_f64 (float64_t *__a, float64x1_t __b, const int __lane)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 6, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 5, 3, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_p8 (poly8_t *__a, poly8x8_t __b, const int __lane)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_p16 (poly16_t *__a, poly16x4_t __b, const int __lane)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {16, 0, 18, 2, 20, 4, 22, 6, 24, 8, 26, 10, 28, 12, 30, 14});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {1, 17, 3, 19, 5, 21, 7, 23, 9, 25, 11, 27, 13, 29, 15, 31});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_s8 (int8_t *__a, int8x8_t __b, const int __lane)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_s16 (int16_t *__a, int16x4_t __b, const int __lane)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 6, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 5, 3, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_s32 (int32_t *__a, int32x2_t __b, const int __lane)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn2q_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_s64 (int64_t *__a, int64x1_t __b, const int __lane)
++__extension__ extern __inline float16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (float16x4x2_t) {vtrn1_f16 (__a, __b), vtrn2_f16 (__a, __b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_u8 (uint8_t *__a, uint8x8_t __b, const int __lane)
++__extension__ extern __inline float32x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn_f32 (float32x2_t a, float32x2_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (float32x2x2_t) {vtrn1_f32 (a, b), vtrn2_f32 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_u16 (uint16_t *__a, uint16x4_t __b, const int __lane)
++__extension__ extern __inline poly8x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn_p8 (poly8x8_t a, poly8x8_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (poly8x8x2_t) {vtrn1_p8 (a, b), vtrn2_p8 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_u32 (uint32_t *__a, uint32x2_t __b, const int __lane)
++__extension__ extern __inline poly16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn_p16 (poly16x4_t a, poly16x4_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (poly16x4x2_t) {vtrn1_p16 (a, b), vtrn2_p16 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1_lane_u64 (uint64_t *__a, uint64x1_t __b, const int __lane)
++__extension__ extern __inline int8x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn_s8 (int8x8_t a, int8x8_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (int8x8x2_t) {vtrn1_s8 (a, b), vtrn2_s8 (a, b)};
+ }
+
+-/* vst1q_lane */
+-
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_f16 (float16_t *__a, float16x8_t __b, const int __lane)
++__extension__ extern __inline int16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn_s16 (int16x4_t a, int16x4_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (int16x4x2_t) {vtrn1_s16 (a, b), vtrn2_s16 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_f32 (float32_t *__a, float32x4_t __b, const int __lane)
++__extension__ extern __inline int32x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn_s32 (int32x2_t a, int32x2_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (int32x2x2_t) {vtrn1_s32 (a, b), vtrn2_s32 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_f64 (float64_t *__a, float64x2_t __b, const int __lane)
++__extension__ extern __inline uint8x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn_u8 (uint8x8_t a, uint8x8_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (uint8x8x2_t) {vtrn1_u8 (a, b), vtrn2_u8 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_p8 (poly8_t *__a, poly8x16_t __b, const int __lane)
++__extension__ extern __inline uint16x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn_u16 (uint16x4_t a, uint16x4_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (uint16x4x2_t) {vtrn1_u16 (a, b), vtrn2_u16 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_p16 (poly16_t *__a, poly16x8_t __b, const int __lane)
++__extension__ extern __inline uint32x2x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrn_u32 (uint32x2_t a, uint32x2_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (uint32x2x2_t) {vtrn1_u32 (a, b), vtrn2_u32 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_s8 (int8_t *__a, int8x16_t __b, const int __lane)
++__extension__ extern __inline float16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrnq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (float16x8x2_t) {vtrn1q_f16 (__a, __b), vtrn2q_f16 (__a, __b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_s16 (int16_t *__a, int16x8_t __b, const int __lane)
++__extension__ extern __inline float32x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrnq_f32 (float32x4_t a, float32x4_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (float32x4x2_t) {vtrn1q_f32 (a, b), vtrn2q_f32 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_s32 (int32_t *__a, int32x4_t __b, const int __lane)
++__extension__ extern __inline poly8x16x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrnq_p8 (poly8x16_t a, poly8x16_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (poly8x16x2_t) {vtrn1q_p8 (a, b), vtrn2q_p8 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_s64 (int64_t *__a, int64x2_t __b, const int __lane)
++__extension__ extern __inline poly16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrnq_p16 (poly16x8_t a, poly16x8_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (poly16x8x2_t) {vtrn1q_p16 (a, b), vtrn2q_p16 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_u8 (uint8_t *__a, uint8x16_t __b, const int __lane)
++__extension__ extern __inline int8x16x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrnq_s8 (int8x16_t a, int8x16_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (int8x16x2_t) {vtrn1q_s8 (a, b), vtrn2q_s8 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_u16 (uint16_t *__a, uint16x8_t __b, const int __lane)
++__extension__ extern __inline int16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrnq_s16 (int16x8_t a, int16x8_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (int16x8x2_t) {vtrn1q_s16 (a, b), vtrn2q_s16 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_u32 (uint32_t *__a, uint32x4_t __b, const int __lane)
++__extension__ extern __inline int32x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrnq_s32 (int32x4_t a, int32x4_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (int32x4x2_t) {vtrn1q_s32 (a, b), vtrn2q_s32 (a, b)};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst1q_lane_u64 (uint64_t *__a, uint64x2_t __b, const int __lane)
++__extension__ extern __inline uint8x16x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrnq_u8 (uint8x16_t a, uint8x16_t b)
+ {
+- *__a = __aarch64_vget_lane_any (__b, __lane);
++ return (uint8x16x2_t) {vtrn1q_u8 (a, b), vtrn2q_u8 (a, b)};
+ }
+
+-/* vstn */
+-
+-__extension__ static __inline void
+-vst2_s64 (int64_t * __a, int64x1x2_t val)
++__extension__ extern __inline uint16x8x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrnq_u16 (uint16x8_t a, uint16x8_t b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- int64x2x2_t temp;
+- temp.val[0] = vcombine_s64 (val.val[0], vcreate_s64 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s64 (val.val[1], vcreate_s64 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[1], 1);
+- __builtin_aarch64_st2di ((__builtin_aarch64_simd_di *) __a, __o);
++ return (uint16x8x2_t) {vtrn1q_u16 (a, b), vtrn2q_u16 (a, b)};
+ }
+
+-__extension__ static __inline void
+-vst2_u64 (uint64_t * __a, uint64x1x2_t val)
++__extension__ extern __inline uint32x4x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtrnq_u32 (uint32x4_t a, uint32x4_t b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- uint64x2x2_t temp;
+- temp.val[0] = vcombine_u64 (val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u64 (val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[1], 1);
+- __builtin_aarch64_st2di ((__builtin_aarch64_simd_di *) __a, __o);
++ return (uint32x4x2_t) {vtrn1q_u32 (a, b), vtrn2q_u32 (a, b)};
+ }
+
+-__extension__ static __inline void
+-vst2_f64 (float64_t * __a, float64x1x2_t val)
+-{
+- __builtin_aarch64_simd_oi __o;
+- float64x2x2_t temp;
+- temp.val[0] = vcombine_f64 (val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_f64 (val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) temp.val[1], 1);
+- __builtin_aarch64_st2df ((__builtin_aarch64_simd_df *) __a, __o);
+-}
++/* vtst */
+
+-__extension__ static __inline void
+-vst2_s8 (int8_t * __a, int8x8x2_t val)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtst_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- int8x16x2_t temp;
+- temp.val[0] = vcombine_s8 (val.val[0], vcreate_s8 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s8 (val.val[1], vcreate_s8 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[1], 1);
+- __builtin_aarch64_st2v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
++ return (uint8x8_t) ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2_p8 (poly8_t * __a, poly8x8x2_t val)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtst_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- poly8x16x2_t temp;
+- temp.val[0] = vcombine_p8 (val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_p8 (val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[1], 1);
+- __builtin_aarch64_st2v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
++ return (uint16x4_t) ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2_s16 (int16_t * __a, int16x4x2_t val)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtst_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- int16x8x2_t temp;
+- temp.val[0] = vcombine_s16 (val.val[0], vcreate_s16 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s16 (val.val[1], vcreate_s16 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[1], 1);
+- __builtin_aarch64_st2v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
++ return (uint32x2_t) ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2_p16 (poly16_t * __a, poly16x4x2_t val)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtst_s64 (int64x1_t __a, int64x1_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- poly16x8x2_t temp;
+- temp.val[0] = vcombine_p16 (val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_p16 (val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[1], 1);
+- __builtin_aarch64_st2v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
++ return (uint64x1_t) ((__a & __b) != __AARCH64_INT64_C (0));
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2_s32 (int32_t * __a, int32x2x2_t val)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtst_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- int32x4x2_t temp;
+- temp.val[0] = vcombine_s32 (val.val[0], vcreate_s32 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s32 (val.val[1], vcreate_s32 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[1], 1);
+- __builtin_aarch64_st2v2si ((__builtin_aarch64_simd_si *) __a, __o);
++ return ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2_u8 (uint8_t * __a, uint8x8x2_t val)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtst_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- uint8x16x2_t temp;
+- temp.val[0] = vcombine_u8 (val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u8 (val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[1], 1);
+- __builtin_aarch64_st2v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
++ return ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2_u16 (uint16_t * __a, uint16x4x2_t val)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtst_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- uint16x8x2_t temp;
+- temp.val[0] = vcombine_u16 (val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u16 (val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[1], 1);
+- __builtin_aarch64_st2v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
++ return ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2_u32 (uint32_t * __a, uint32x2x2_t val)
++__extension__ extern __inline uint64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtst_u64 (uint64x1_t __a, uint64x1_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- uint32x4x2_t temp;
+- temp.val[0] = vcombine_u32 (val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u32 (val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[1], 1);
+- __builtin_aarch64_st2v2si ((__builtin_aarch64_simd_si *) __a, __o);
++ return ((__a & __b) != __AARCH64_UINT64_C (0));
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2_f16 (float16_t * __a, float16x4x2_t val)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- float16x8x2_t temp;
+- temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv8hf (__o, temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv8hf (__o, temp.val[1], 1);
+- __builtin_aarch64_st2v4hf (__a, __o);
++ return (uint8x16_t) ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2_f32 (float32_t * __a, float32x2x2_t val)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstq_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- float32x4x2_t temp;
+- temp.val[0] = vcombine_f32 (val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_f32 (val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) temp.val[1], 1);
+- __builtin_aarch64_st2v2sf ((__builtin_aarch64_simd_sf *) __a, __o);
++ return (uint16x8_t) ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_s8 (int8_t * __a, int8x16x2_t val)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstq_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1);
+- __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
++ return (uint32x4_t) ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_p8 (poly8_t * __a, poly8x16x2_t val)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstq_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1);
+- __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
++ return (uint64x2_t) ((__a & __b) != __AARCH64_INT64_C (0));
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_s16 (int16_t * __a, int16x8x2_t val)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstq_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1);
+- __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
++ return ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_p16 (poly16_t * __a, poly16x8x2_t val)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstq_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1);
+- __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
++ return ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_s32 (int32_t * __a, int32x4x2_t val)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstq_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[1], 1);
+- __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __o);
++ return ((__a & __b) != 0);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_s64 (int64_t * __a, int64x2x2_t val)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstq_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[1], 1);
+- __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __o);
++ return ((__a & __b) != __AARCH64_UINT64_C (0));
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_u8 (uint8_t * __a, uint8x16x2_t val)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstd_s64 (int64_t __a, int64_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1);
+- __builtin_aarch64_st2v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
++ return (__a & __b) ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_u16 (uint16_t * __a, uint16x8x2_t val)
++__extension__ extern __inline uint64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vtstd_u64 (uint64_t __a, uint64_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1);
+- __builtin_aarch64_st2v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
++ return (__a & __b) ? -1ll : 0ll;
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_u32 (uint32_t * __a, uint32x4x2_t val)
++/* vuqadd */
++
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqadd_s8 (int8x8_t __a, uint8x8_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[1], 1);
+- __builtin_aarch64_st2v4si ((__builtin_aarch64_simd_si *) __a, __o);
++ return __builtin_aarch64_suqaddv8qi_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_u64 (uint64_t * __a, uint64x2x2_t val)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqadd_s16 (int16x4_t __a, uint16x4_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[1], 1);
+- __builtin_aarch64_st2v2di ((__builtin_aarch64_simd_di *) __a, __o);
++ return __builtin_aarch64_suqaddv4hi_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_f16 (float16_t * __a, float16x8x2_t val)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqadd_s32 (int32x2_t __a, uint32x2_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv8hf (__o, val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv8hf (__o, val.val[1], 1);
+- __builtin_aarch64_st2v8hf (__a, __o);
++ return __builtin_aarch64_suqaddv2si_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_f32 (float32_t * __a, float32x4x2_t val)
++__extension__ extern __inline int64x1_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqadd_s64 (int64x1_t __a, uint64x1_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) val.val[1], 1);
+- __builtin_aarch64_st2v4sf ((__builtin_aarch64_simd_sf *) __a, __o);
++ return (int64x1_t) {__builtin_aarch64_suqadddi_ssu (__a[0], __b[0])};
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst2q_f64 (float64_t * __a, float64x2x2_t val)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqaddq_s8 (int8x16_t __a, uint8x16_t __b)
+ {
+- __builtin_aarch64_simd_oi __o;
+- __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) val.val[1], 1);
+- __builtin_aarch64_st2v2df ((__builtin_aarch64_simd_df *) __a, __o);
++ return __builtin_aarch64_suqaddv16qi_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void
+-vst3_s64 (int64_t * __a, int64x1x3_t val)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqaddq_s16 (int16x8_t __a, uint16x8_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- int64x2x3_t temp;
+- temp.val[0] = vcombine_s64 (val.val[0], vcreate_s64 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s64 (val.val[1], vcreate_s64 (__AARCH64_INT64_C (0)));
+- temp.val[2] = vcombine_s64 (val.val[2], vcreate_s64 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[2], 2);
+- __builtin_aarch64_st3di ((__builtin_aarch64_simd_di *) __a, __o);
++ return __builtin_aarch64_suqaddv8hi_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void
+-vst3_u64 (uint64_t * __a, uint64x1x3_t val)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqaddq_s32 (int32x4_t __a, uint32x4_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- uint64x2x3_t temp;
+- temp.val[0] = vcombine_u64 (val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u64 (val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_u64 (val.val[2], vcreate_u64 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[2], 2);
+- __builtin_aarch64_st3di ((__builtin_aarch64_simd_di *) __a, __o);
++ return __builtin_aarch64_suqaddv4si_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void
+-vst3_f64 (float64_t * __a, float64x1x3_t val)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqaddq_s64 (int64x2_t __a, uint64x2_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- float64x2x3_t temp;
+- temp.val[0] = vcombine_f64 (val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_f64 (val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_f64 (val.val[2], vcreate_f64 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) temp.val[2], 2);
+- __builtin_aarch64_st3df ((__builtin_aarch64_simd_df *) __a, __o);
++ return __builtin_aarch64_suqaddv2di_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void
+-vst3_s8 (int8_t * __a, int8x8x3_t val)
++__extension__ extern __inline int8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqaddb_s8 (int8_t __a, uint8_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- int8x16x3_t temp;
+- temp.val[0] = vcombine_s8 (val.val[0], vcreate_s8 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s8 (val.val[1], vcreate_s8 (__AARCH64_INT64_C (0)));
+- temp.val[2] = vcombine_s8 (val.val[2], vcreate_s8 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[2], 2);
+- __builtin_aarch64_st3v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
++ return __builtin_aarch64_suqaddqi_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3_p8 (poly8_t * __a, poly8x8x3_t val)
++__extension__ extern __inline int16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqaddh_s16 (int16_t __a, uint16_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- poly8x16x3_t temp;
+- temp.val[0] = vcombine_p8 (val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_p8 (val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_p8 (val.val[2], vcreate_p8 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[2], 2);
+- __builtin_aarch64_st3v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
++ return __builtin_aarch64_suqaddhi_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3_s16 (int16_t * __a, int16x4x3_t val)
++__extension__ extern __inline int32_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqadds_s32 (int32_t __a, uint32_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- int16x8x3_t temp;
+- temp.val[0] = vcombine_s16 (val.val[0], vcreate_s16 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s16 (val.val[1], vcreate_s16 (__AARCH64_INT64_C (0)));
+- temp.val[2] = vcombine_s16 (val.val[2], vcreate_s16 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[2], 2);
+- __builtin_aarch64_st3v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
++ return __builtin_aarch64_suqaddsi_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3_p16 (poly16_t * __a, poly16x4x3_t val)
++__extension__ extern __inline int64_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuqaddd_s64 (int64_t __a, uint64_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- poly16x8x3_t temp;
+- temp.val[0] = vcombine_p16 (val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_p16 (val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_p16 (val.val[2], vcreate_p16 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[2], 2);
+- __builtin_aarch64_st3v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
++ return __builtin_aarch64_suqadddi_ssu (__a, __b);
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3_s32 (int32_t * __a, int32x2x3_t val)
++#define __DEFINTERLEAVE(op, rettype, intype, funcsuffix, Q) \
++ __extension__ extern __inline rettype \
++ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \
++ v ## op ## Q ## _ ## funcsuffix (intype a, intype b) \
++ { \
++ return (rettype) {v ## op ## 1 ## Q ## _ ## funcsuffix (a, b), \
++ v ## op ## 2 ## Q ## _ ## funcsuffix (a, b)}; \
++ }
++
++#define __INTERLEAVE_LIST(op) \
++ __DEFINTERLEAVE (op, float16x4x2_t, float16x4_t, f16,) \
++ __DEFINTERLEAVE (op, float32x2x2_t, float32x2_t, f32,) \
++ __DEFINTERLEAVE (op, poly8x8x2_t, poly8x8_t, p8,) \
++ __DEFINTERLEAVE (op, poly16x4x2_t, poly16x4_t, p16,) \
++ __DEFINTERLEAVE (op, int8x8x2_t, int8x8_t, s8,) \
++ __DEFINTERLEAVE (op, int16x4x2_t, int16x4_t, s16,) \
++ __DEFINTERLEAVE (op, int32x2x2_t, int32x2_t, s32,) \
++ __DEFINTERLEAVE (op, uint8x8x2_t, uint8x8_t, u8,) \
++ __DEFINTERLEAVE (op, uint16x4x2_t, uint16x4_t, u16,) \
++ __DEFINTERLEAVE (op, uint32x2x2_t, uint32x2_t, u32,) \
++ __DEFINTERLEAVE (op, float16x8x2_t, float16x8_t, f16, q) \
++ __DEFINTERLEAVE (op, float32x4x2_t, float32x4_t, f32, q) \
++ __DEFINTERLEAVE (op, poly8x16x2_t, poly8x16_t, p8, q) \
++ __DEFINTERLEAVE (op, poly16x8x2_t, poly16x8_t, p16, q) \
++ __DEFINTERLEAVE (op, int8x16x2_t, int8x16_t, s8, q) \
++ __DEFINTERLEAVE (op, int16x8x2_t, int16x8_t, s16, q) \
++ __DEFINTERLEAVE (op, int32x4x2_t, int32x4_t, s32, q) \
++ __DEFINTERLEAVE (op, uint8x16x2_t, uint8x16_t, u8, q) \
++ __DEFINTERLEAVE (op, uint16x8x2_t, uint16x8_t, u16, q) \
++ __DEFINTERLEAVE (op, uint32x4x2_t, uint32x4_t, u32, q)
++
++/* vuzp */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- int32x4x3_t temp;
+- temp.val[0] = vcombine_s32 (val.val[0], vcreate_s32 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s32 (val.val[1], vcreate_s32 (__AARCH64_INT64_C (0)));
+- temp.val[2] = vcombine_s32 (val.val[2], vcreate_s32 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[2], 2);
+- __builtin_aarch64_st3v2si ((__builtin_aarch64_simd_si *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 7, 1, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 2, 4, 6});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3_u8 (uint8_t * __a, uint8x8x3_t val)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- uint8x16x3_t temp;
+- temp.val[0] = vcombine_u8 (val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u8 (val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_u8 (val.val[2], vcreate_u8 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[2], 2);
+- __builtin_aarch64_st3v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3_u16 (uint16_t * __a, uint16x4x3_t val)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1_p8 (poly8x8_t __a, poly8x8_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- uint16x8x3_t temp;
+- temp.val[0] = vcombine_u16 (val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u16 (val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_u16 (val.val[2], vcreate_u16 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[2], 2);
+- __builtin_aarch64_st3v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3_u32 (uint32_t * __a, uint32x2x3_t val)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1_p16 (poly16x4_t __a, poly16x4_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- uint32x4x3_t temp;
+- temp.val[0] = vcombine_u32 (val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u32 (val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_u32 (val.val[2], vcreate_u32 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[2], 2);
+- __builtin_aarch64_st3v2si ((__builtin_aarch64_simd_si *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 7, 1, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 2, 4, 6});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3_f16 (float16_t * __a, float16x4x3_t val)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- float16x8x3_t temp;
+- temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_f16 (val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[2], 2);
+- __builtin_aarch64_st3v4hf ((__builtin_aarch64_simd_hf *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3_f32 (float32_t * __a, float32x2x3_t val)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- float32x4x3_t temp;
+- temp.val[0] = vcombine_f32 (val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_f32 (val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_f32 (val.val[2], vcreate_f32 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) temp.val[2], 2);
+- __builtin_aarch64_st3v2sf ((__builtin_aarch64_simd_sf *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 7, 1, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 2, 4, 6});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_s8 (int8_t * __a, int8x16x3_t val)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2);
+- __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_p8 (poly8_t * __a, poly8x16x3_t val)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2);
+- __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_s16 (int16_t * __a, int16x8x3_t val)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2);
+- __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 7, 1, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 2, 4, 6});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_p16 (poly16_t * __a, poly16x8x3_t val)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2);
+- __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_s32 (int32_t * __a, int32x4x3_t val)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[2], 2);
+- __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_s64 (int64_t * __a, int64x2x3_t val)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[2], 2);
+- __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 7, 1, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 2, 4, 6});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_u8 (uint8_t * __a, uint8x16x3_t val)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2);
+- __builtin_aarch64_st3v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_u16 (uint16_t * __a, uint16x8x3_t val)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_p8 (poly8x16_t __a, poly8x16_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2);
+- __builtin_aarch64_st3v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_u32 (uint32_t * __a, uint32x4x3_t val)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_p16 (poly16x8_t __a, poly16x8_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[2], 2);
+- __builtin_aarch64_st3v4si ((__builtin_aarch64_simd_si *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_u64 (uint64_t * __a, uint64x2x3_t val)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[2], 2);
+- __builtin_aarch64_st3v2di ((__builtin_aarch64_simd_di *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_f16 (float16_t * __a, float16x8x3_t val)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[2], 2);
+- __builtin_aarch64_st3v8hf ((__builtin_aarch64_simd_hf *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_f32 (float32_t * __a, float32x4x3_t val)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[2], 2);
+- __builtin_aarch64_st3v4sf ((__builtin_aarch64_simd_sf *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 7, 1, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 2, 4, 6});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst3q_f64 (float64_t * __a, float64x2x3_t val)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- __builtin_aarch64_simd_ci __o;
+- __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[2], 2);
+- __builtin_aarch64_st3v2df ((__builtin_aarch64_simd_df *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline void
+-vst4_s64 (int64_t * __a, int64x1x4_t val)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- int64x2x4_t temp;
+- temp.val[0] = vcombine_s64 (val.val[0], vcreate_s64 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s64 (val.val[1], vcreate_s64 (__AARCH64_INT64_C (0)));
+- temp.val[2] = vcombine_s64 (val.val[2], vcreate_s64 (__AARCH64_INT64_C (0)));
+- temp.val[3] = vcombine_s64 (val.val[3], vcreate_s64 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[3], 3);
+- __builtin_aarch64_st4di ((__builtin_aarch64_simd_di *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30});
++#endif
+ }
+
+-__extension__ static __inline void
+-vst4_u64 (uint64_t * __a, uint64x1x4_t val)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- uint64x2x4_t temp;
+- temp.val[0] = vcombine_u64 (val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u64 (val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_u64 (val.val[2], vcreate_u64 (__AARCH64_UINT64_C (0)));
+- temp.val[3] = vcombine_u64 (val.val[3], vcreate_u64 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) temp.val[3], 3);
+- __builtin_aarch64_st4di ((__builtin_aarch64_simd_di *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
++#endif
+ }
+
+-__extension__ static __inline void
+-vst4_f64 (float64_t * __a, float64x1x4_t val)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- float64x2x4_t temp;
+- temp.val[0] = vcombine_f64 (val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_f64 (val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_f64 (val.val[2], vcreate_f64 (__AARCH64_UINT64_C (0)));
+- temp.val[3] = vcombine_f64 (val.val[3], vcreate_f64 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) temp.val[3], 3);
+- __builtin_aarch64_st4df ((__builtin_aarch64_simd_df *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 7, 1, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 2, 4, 6});
++#endif
+ }
+
+-__extension__ static __inline void
+-vst4_s8 (int8_t * __a, int8x8x4_t val)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp1q_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- int8x16x4_t temp;
+- temp.val[0] = vcombine_s8 (val.val[0], vcreate_s8 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s8 (val.val[1], vcreate_s8 (__AARCH64_INT64_C (0)));
+- temp.val[2] = vcombine_s8 (val.val[2], vcreate_s8 (__AARCH64_INT64_C (0)));
+- temp.val[3] = vcombine_s8 (val.val[3], vcreate_s8 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[3], 3);
+- __builtin_aarch64_st4v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
++#endif
++}
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2_f16 (float16x4_t __a, float16x4_t __b)
++{
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 6, 0, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 3, 5, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4_p8 (poly8_t * __a, poly8x8x4_t val)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- poly8x16x4_t temp;
+- temp.val[0] = vcombine_p8 (val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_p8 (val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_p8 (val.val[2], vcreate_p8 (__AARCH64_UINT64_C (0)));
+- temp.val[3] = vcombine_p8 (val.val[3], vcreate_p8 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[3], 3);
+- __builtin_aarch64_st4v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4_s16 (int16_t * __a, int16x4x4_t val)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2_p8 (poly8x8_t __a, poly8x8_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- int16x8x4_t temp;
+- temp.val[0] = vcombine_s16 (val.val[0], vcreate_s16 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s16 (val.val[1], vcreate_s16 (__AARCH64_INT64_C (0)));
+- temp.val[2] = vcombine_s16 (val.val[2], vcreate_s16 (__AARCH64_INT64_C (0)));
+- temp.val[3] = vcombine_s16 (val.val[3], vcreate_s16 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[3], 3);
+- __builtin_aarch64_st4v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4_p16 (poly16_t * __a, poly16x4x4_t val)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2_p16 (poly16x4_t __a, poly16x4_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- poly16x8x4_t temp;
+- temp.val[0] = vcombine_p16 (val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_p16 (val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_p16 (val.val[2], vcreate_p16 (__AARCH64_UINT64_C (0)));
+- temp.val[3] = vcombine_p16 (val.val[3], vcreate_p16 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[3], 3);
+- __builtin_aarch64_st4v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 6, 0, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 3, 5, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4_s32 (int32_t * __a, int32x2x4_t val)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- int32x4x4_t temp;
+- temp.val[0] = vcombine_s32 (val.val[0], vcreate_s32 (__AARCH64_INT64_C (0)));
+- temp.val[1] = vcombine_s32 (val.val[1], vcreate_s32 (__AARCH64_INT64_C (0)));
+- temp.val[2] = vcombine_s32 (val.val[2], vcreate_s32 (__AARCH64_INT64_C (0)));
+- temp.val[3] = vcombine_s32 (val.val[3], vcreate_s32 (__AARCH64_INT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[3], 3);
+- __builtin_aarch64_st4v2si ((__builtin_aarch64_simd_si *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4_u8 (uint8_t * __a, uint8x8x4_t val)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- uint8x16x4_t temp;
+- temp.val[0] = vcombine_u8 (val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u8 (val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_u8 (val.val[2], vcreate_u8 (__AARCH64_UINT64_C (0)));
+- temp.val[3] = vcombine_u8 (val.val[3], vcreate_u8 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) temp.val[3], 3);
+- __builtin_aarch64_st4v8qi ((__builtin_aarch64_simd_qi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 6, 0, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 3, 5, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4_u16 (uint16_t * __a, uint16x4x4_t val)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- uint16x8x4_t temp;
+- temp.val[0] = vcombine_u16 (val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u16 (val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_u16 (val.val[2], vcreate_u16 (__AARCH64_UINT64_C (0)));
+- temp.val[3] = vcombine_u16 (val.val[3], vcreate_u16 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) temp.val[3], 3);
+- __builtin_aarch64_st4v4hi ((__builtin_aarch64_simd_hi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4_u32 (uint32_t * __a, uint32x2x4_t val)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- uint32x4x4_t temp;
+- temp.val[0] = vcombine_u32 (val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_u32 (val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_u32 (val.val[2], vcreate_u32 (__AARCH64_UINT64_C (0)));
+- temp.val[3] = vcombine_u32 (val.val[3], vcreate_u32 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) temp.val[3], 3);
+- __builtin_aarch64_st4v2si ((__builtin_aarch64_simd_si *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4_f16 (float16_t * __a, float16x4x4_t val)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- float16x8x4_t temp;
+- temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_f16 (val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0)));
+- temp.val[3] = vcombine_f16 (val.val[3], vcreate_f16 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) temp.val[3], 3);
+- __builtin_aarch64_st4v4hf ((__builtin_aarch64_simd_hf *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 6, 0, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 3, 5, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4_f32 (float32_t * __a, float32x2x4_t val)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- float32x4x4_t temp;
+- temp.val[0] = vcombine_f32 (val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0)));
+- temp.val[1] = vcombine_f32 (val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0)));
+- temp.val[2] = vcombine_f32 (val.val[2], vcreate_f32 (__AARCH64_UINT64_C (0)));
+- temp.val[3] = vcombine_f32 (val.val[3], vcreate_f32 (__AARCH64_UINT64_C (0)));
+- __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) temp.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) temp.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) temp.val[3], 3);
+- __builtin_aarch64_st4v2sf ((__builtin_aarch64_simd_sf *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_s8 (int8_t * __a, int8x16x4_t val)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[3], 3);
+- __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_p8 (poly8_t * __a, poly8x16x4_t val)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[3], 3);
+- __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 6, 0, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 3, 5, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_s16 (int16_t * __a, int16x8x4_t val)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[3], 3);
+- __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_p16 (poly16_t * __a, poly16x8x4_t val)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_p8 (poly8x16_t __a, poly8x16_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[3], 3);
+- __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_s32 (int32_t * __a, int32x4x4_t val)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_p16 (poly16x8_t __a, poly16x8_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[3], 3);
+- __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_s64 (int64_t * __a, int64x2x4_t val)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_s8 (int8x16_t __a, int8x16_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[3], 3);
+- __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint8x16_t) {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_u8 (uint8_t * __a, uint8x16x4_t val)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_s16 (int16x8_t __a, int16x8_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t) val.val[3], 3);
+- __builtin_aarch64_st4v16qi ((__builtin_aarch64_simd_qi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_u16 (uint16_t * __a, uint16x8x4_t val)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_s32 (int32x4_t __a, int32x4_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv8hi (__o, (int16x8_t) val.val[3], 3);
+- __builtin_aarch64_st4v8hi ((__builtin_aarch64_simd_hi *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 6, 0, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 3, 5, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_u32 (uint32_t * __a, uint32x4x4_t val)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_s64 (int64x2_t __a, int64x2_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv4si (__o, (int32x4_t) val.val[3], 3);
+- __builtin_aarch64_st4v4si ((__builtin_aarch64_simd_si *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_u64 (uint64_t * __a, uint64x2x4_t val)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv2di (__o, (int64x2_t) val.val[3], 3);
+- __builtin_aarch64_st4v2di ((__builtin_aarch64_simd_di *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_f16 (float16_t * __a, float16x8x4_t val)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv8hf (__o, (float16x8_t) val.val[3], 3);
+- __builtin_aarch64_st4v8hf ((__builtin_aarch64_simd_hf *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_f32 (float32_t * __a, float32x4x4_t val)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv4sf (__o, (float32x4_t) val.val[3], 3);
+- __builtin_aarch64_st4v4sf ((__builtin_aarch64_simd_sf *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 6, 0, 2});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 3, 5, 7});
++#endif
+ }
+
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vst4q_f64 (float64_t * __a, float64x2x4_t val)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vuzp2q_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+- __builtin_aarch64_simd_xi __o;
+- __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[0], 0);
+- __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[1], 1);
+- __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[2], 2);
+- __o = __builtin_aarch64_set_qregxiv2df (__o, (float64x2_t) val.val[3], 3);
+- __builtin_aarch64_st4v2df ((__builtin_aarch64_simd_df *) __a, __o);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
++#endif
+ }
+
+-/* vsub */
++__INTERLEAVE_LIST (uzp)
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vsubd_s64 (int64_t __a, int64_t __b)
++/* vzip */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- return __a - __b;
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {6, 2, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 1, 5});
++#endif
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vsubd_u64 (uint64_t __a, uint64_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1_f32 (float32x2_t __a, float32x2_t __b)
+ {
+- return __a - __b;
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++#endif
+ }
+
+-/* vtbx1 */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vtbx1_s8 (int8x8_t __r, int8x8_t __tab, int8x8_t __idx)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1_p8 (poly8x8_t __a, poly8x8_t __b)
+ {
+- uint8x8_t __mask = vclt_u8 (vreinterpret_u8_s8 (__idx),
+- vmov_n_u8 (8));
+- int8x8_t __tbl = vtbl1_s8 (__tab, __idx);
+-
+- return vbsl_s8 (__mask, __tbl, __r);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {12, 4, 13, 5, 14, 6, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
++#endif
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtbx1_u8 (uint8x8_t __r, uint8x8_t __tab, uint8x8_t __idx)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1_p16 (poly16x4_t __a, poly16x4_t __b)
+ {
+- uint8x8_t __mask = vclt_u8 (__idx, vmov_n_u8 (8));
+- uint8x8_t __tbl = vtbl1_u8 (__tab, __idx);
+-
+- return vbsl_u8 (__mask, __tbl, __r);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {6, 2, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 1, 5});
++#endif
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vtbx1_p8 (poly8x8_t __r, poly8x8_t __tab, uint8x8_t __idx)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1_s8 (int8x8_t __a, int8x8_t __b)
+ {
+- uint8x8_t __mask = vclt_u8 (__idx, vmov_n_u8 (8));
+- poly8x8_t __tbl = vtbl1_p8 (__tab, __idx);
+-
+- return vbsl_p8 (__mask, __tbl, __r);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {12, 4, 13, 5, 14, 6, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
++#endif
+ }
+
+-/* vtbx3 */
+-
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vtbx3_s8 (int8x8_t __r, int8x8x3_t __tab, int8x8_t __idx)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1_s16 (int16x4_t __a, int16x4_t __b)
+ {
+- uint8x8_t __mask = vclt_u8 (vreinterpret_u8_s8 (__idx),
+- vmov_n_u8 (24));
+- int8x8_t __tbl = vtbl3_s8 (__tab, __idx);
+-
+- return vbsl_s8 (__mask, __tbl, __r);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {6, 2, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 1, 5});
++#endif
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtbx3_u8 (uint8x8_t __r, uint8x8x3_t __tab, uint8x8_t __idx)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1_s32 (int32x2_t __a, int32x2_t __b)
+ {
+- uint8x8_t __mask = vclt_u8 (__idx, vmov_n_u8 (24));
+- uint8x8_t __tbl = vtbl3_u8 (__tab, __idx);
+-
+- return vbsl_u8 (__mask, __tbl, __r);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++#endif
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vtbx3_p8 (poly8x8_t __r, poly8x8x3_t __tab, uint8x8_t __idx)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+- uint8x8_t __mask = vclt_u8 (__idx, vmov_n_u8 (24));
+- poly8x8_t __tbl = vtbl3_p8 (__tab, __idx);
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {12, 4, 13, 5, 14, 6, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
++#endif
++}
+
+- return vbsl_p8 (__mask, __tbl, __r);
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1_u16 (uint16x4_t __a, uint16x4_t __b)
++{
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {6, 2, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 1, 5});
++#endif
+ }
+
+-/* vtbx4 */
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1_u32 (uint32x2_t __a, uint32x2_t __b)
++{
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++#endif
++}
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vtbx4_s8 (int8x8_t __r, int8x8x4_t __tab, int8x8_t __idx)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- int8x8_t result;
+- int8x16x2_t temp;
+- __builtin_aarch64_simd_oi __o;
+- temp.val[0] = vcombine_s8 (__tab.val[0], __tab.val[1]);
+- temp.val[1] = vcombine_s8 (__tab.val[2], __tab.val[3]);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[1], 1);
+- result = __builtin_aarch64_tbx4v8qi (__r, __o, __idx);
+- return result;
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b,
++ (uint16x8_t) {12, 4, 13, 5, 14, 6, 15, 7});
++#else
++ return __builtin_shuffle (__a, __b,
++ (uint16x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
++#endif
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtbx4_u8 (uint8x8_t __r, uint8x8x4_t __tab, uint8x8_t __idx)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_f32 (float32x4_t __a, float32x4_t __b)
+ {
+- uint8x8_t result;
+- uint8x16x2_t temp;
+- __builtin_aarch64_simd_oi __o;
+- temp.val[0] = vcombine_u8 (__tab.val[0], __tab.val[1]);
+- temp.val[1] = vcombine_u8 (__tab.val[2], __tab.val[3]);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[1], 1);
+- result = (uint8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)__r, __o,
+- (int8x8_t)__idx);
+- return result;
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {6, 2, 7, 3});
++#else
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 1, 5});
++#endif
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vtbx4_p8 (poly8x8_t __r, poly8x8x4_t __tab, uint8x8_t __idx)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_f64 (float64x2_t __a, float64x2_t __b)
+ {
+- poly8x8_t result;
+- poly8x16x2_t temp;
+- __builtin_aarch64_simd_oi __o;
+- temp.val[0] = vcombine_p8 (__tab.val[0], __tab.val[1]);
+- temp.val[1] = vcombine_p8 (__tab.val[2], __tab.val[3]);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[0], 0);
+- __o = __builtin_aarch64_set_qregoiv16qi (__o,
+- (int8x16_t) temp.val[1], 1);
+- result = (poly8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)__r, __o,
+- (int8x8_t)__idx);
+- return result;
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
++#else
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
++#endif
+ }
+
+-/* vtrn */
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_p8 (poly8x16_t __a, poly8x16_t __b)
++{
++#ifdef __AARCH64EB__
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15});
++#else
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23});
++#endif
++}
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vtrn1_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_p16 (poly16x8_t __a, poly16x8_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++ return __builtin_shuffle (__a, __b, (uint16x8_t)
++ {12, 4, 13, 5, 14, 6, 15, 7});
+ #else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+ #endif
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vtrn1_p8 (poly8x8_t __a, poly8x8_t __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15});
+ #else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23});
+ #endif
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vtrn1_p16 (poly16x4_t __a, poly16x4_t __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 1, 7, 3});
++ return __builtin_shuffle (__a, __b, (uint16x8_t)
++ {12, 4, 13, 5, 14, 6, 15, 7});
+ #else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 2, 6});
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+ #endif
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vtrn1_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {6, 2, 7, 3});
+ #else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 1, 5});
+ #endif
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vtrn1_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 1, 7, 3});
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
+ #else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 2, 6});
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
+ #endif
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vtrn1_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15});
+ #else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23});
+ #endif
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtrn1_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++ return __builtin_shuffle (__a, __b, (uint16x8_t)
++ {12, 4, 13, 5, 14, 6, 15, 7});
+ #else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+ #endif
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vtrn1_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 1, 7, 3});
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {6, 2, 7, 3});
+ #else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 2, 6});
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 1, 5});
+ #endif
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vtrn1_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip1q_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
+ #else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
+ #endif
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vtrn1q_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2_f16 (float16x4_t __a, float16x4_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 1, 7, 3});
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 5, 1});
+ #else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 2, 6});
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {2, 6, 3, 7});
+ #endif
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vtrn1q_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline float32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2_f32 (float32x2_t __a, float32x2_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
+ #else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
+ #endif
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vtrn1q_p8 (poly8x16_t __a, poly8x16_t __b)
++__extension__ extern __inline poly8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2_p8 (poly8x8_t __a, poly8x8_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15});
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+ #else
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {0, 16, 2, 18, 4, 20, 6, 22, 8, 24, 10, 26, 12, 28, 14, 30});
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {4, 12, 5, 13, 6, 14, 7, 15});
+ #endif
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vtrn1q_p16 (poly16x8_t __a, poly16x8_t __b)
++__extension__ extern __inline poly16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2_p16 (poly16x4_t __a, poly16x4_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 5, 1});
+ #else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {2, 6, 3, 7});
+ #endif
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vtrn1q_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline int8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2_s8 (int8x8_t __a, int8x8_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15});
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+ #else
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {0, 16, 2, 18, 4, 20, 6, 22, 8, 24, 10, 26, 12, 28, 14, 30});
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {4, 12, 5, 13, 6, 14, 7, 15});
+ #endif
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vtrn1q_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2_s16 (int16x4_t __a, int16x4_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 5, 1});
+ #else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {2, 6, 3, 7});
+ #endif
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vtrn1q_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline int32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2_s32 (int32x2_t __a, int32x2_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 1, 7, 3});
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
+ #else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 2, 6});
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
+ #endif
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vtrn1q_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline uint8x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2_u8 (uint8x8_t __a, uint8x8_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+ #else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
++ return __builtin_shuffle (__a, __b, (uint8x8_t) {4, 12, 5, 13, 6, 14, 7, 15});
+ #endif
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vtrn1q_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2_u16 (uint16x4_t __a, uint16x4_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15});
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 5, 1});
+ #else
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {0, 16, 2, 18, 4, 20, 6, 22, 8, 24, 10, 26, 12, 28, 14, 30});
++ return __builtin_shuffle (__a, __b, (uint16x4_t) {2, 6, 3, 7});
+ #endif
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vtrn1q_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline uint32x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2_u32 (uint32x2_t __a, uint32x2_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 1, 11, 3, 13, 5, 15, 7});
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
+ #else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 2, 10, 4, 12, 6, 14});
++ return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
+ #endif
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vtrn1q_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_f16 (float16x8_t __a, float16x8_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 1, 7, 3});
++ return __builtin_shuffle (__a, __b,
++ (uint16x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+ #else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 2, 6});
++ return __builtin_shuffle (__a, __b,
++ (uint16x8_t) {4, 12, 5, 13, 6, 14, 7, 15});
+ #endif
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vtrn1q_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline float32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_f32 (float32x4_t __a, float32x4_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 5, 1});
+ #else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {2, 6, 3, 7});
+ #endif
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vtrn2_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline float64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_f64 (float64x2_t __a, float64x2_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+ #else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
+ #endif
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vtrn2_p8 (poly8x8_t __a, poly8x8_t __b)
++__extension__ extern __inline poly8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_p8 (poly8x16_t __a, poly8x16_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7});
+ #else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31});
+ #endif
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vtrn2_p16 (poly16x4_t __a, poly16x4_t __b)
++__extension__ extern __inline poly16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_p16 (poly16x8_t __a, poly16x8_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 6, 2});
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+ #else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 5, 3, 7});
++ return __builtin_shuffle (__a, __b, (uint16x8_t)
++ {4, 12, 5, 13, 6, 14, 7, 15});
+ #endif
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vtrn2_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline int8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_s8 (int8x16_t __a, int8x16_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7});
+ #else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31});
+ #endif
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vtrn2_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_s16 (int16x8_t __a, int16x8_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 6, 2});
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+ #else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 5, 3, 7});
++ return __builtin_shuffle (__a, __b, (uint16x8_t)
++ {4, 12, 5, 13, 6, 14, 7, 15});
+ #endif
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vtrn2_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline int32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_s32 (int32x4_t __a, int32x4_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 5, 1});
+ #else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {2, 6, 3, 7});
+ #endif
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtrn2_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline int64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_s64 (int64x2_t __a, int64x2_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+ #else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
++ return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
+ #endif
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vtrn2_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline uint8x16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_u8 (uint8x16_t __a, uint8x16_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 6, 2});
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7});
+ #else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 5, 3, 7});
++ return __builtin_shuffle (__a, __b, (uint8x16_t)
++ {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31});
+ #endif
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vtrn2_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_u16 (uint16x8_t __a, uint16x8_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
++ return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+ #else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
++ return __builtin_shuffle (__a, __b, (uint16x8_t)
++ {4, 12, 5, 13, 6, 14, 7, 15});
+ #endif
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vtrn2q_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline uint32x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_u32 (uint32x4_t __a, uint32x4_t __b)
+ {
+ #ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 6, 2});
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 5, 1});
+ #else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 5, 3, 7});
++ return __builtin_shuffle (__a, __b, (uint32x4_t) {2, 6, 3, 7});
+ #endif
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vtrn2q_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline uint64x2_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vzip2q_u64 (uint64x2_t __a, uint64x2_t __b)
+ {
+ #ifdef __AARCH64EB__
+ return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+@@ -24455,1319 +29209,1184 @@ vtrn2q_f64 (float64x2_t __a, float64x2_t __b)
+ #endif
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vtrn2q_p8 (poly8x16_t __a, poly8x16_t __b)
++__INTERLEAVE_LIST (zip)
++
++#undef __INTERLEAVE_LIST
++#undef __DEFINTERLEAVE
++
++/* End of optimal implementations in approved order. */
++
++#pragma GCC pop_options
++
++/* ARMv8.2-A FP16 intrinsics. */
++
++#include "arm_fp16.h"
++
++#pragma GCC push_options
++#pragma GCC target ("arch=armv8.2-a+fp16")
++
++/* ARMv8.2-A FP16 one operand vector intrinsics. */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabs_f16 (float16x4_t __a)
++{
++ return __builtin_aarch64_absv4hf (__a);
++}
++
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabsq_f16 (float16x8_t __a)
++{
++ return __builtin_aarch64_absv8hf (__a);
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqz_f16 (float16x4_t __a)
++{
++ return __builtin_aarch64_cmeqv4hf_uss (__a, vdup_n_f16 (0.0f));
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqzq_f16 (float16x8_t __a)
++{
++ return __builtin_aarch64_cmeqv8hf_uss (__a, vdupq_n_f16 (0.0f));
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgez_f16 (float16x4_t __a)
++{
++ return __builtin_aarch64_cmgev4hf_uss (__a, vdup_n_f16 (0.0f));
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgezq_f16 (float16x8_t __a)
++{
++ return __builtin_aarch64_cmgev8hf_uss (__a, vdupq_n_f16 (0.0f));
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtz_f16 (float16x4_t __a)
++{
++ return __builtin_aarch64_cmgtv4hf_uss (__a, vdup_n_f16 (0.0f));
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtzq_f16 (float16x8_t __a)
++{
++ return __builtin_aarch64_cmgtv8hf_uss (__a, vdupq_n_f16 (0.0f));
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclez_f16 (float16x4_t __a)
++{
++ return __builtin_aarch64_cmlev4hf_uss (__a, vdup_n_f16 (0.0f));
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclezq_f16 (float16x8_t __a)
++{
++ return __builtin_aarch64_cmlev8hf_uss (__a, vdupq_n_f16 (0.0f));
++}
++
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltz_f16 (float16x4_t __a)
++{
++ return __builtin_aarch64_cmltv4hf_uss (__a, vdup_n_f16 (0.0f));
++}
++
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltzq_f16 (float16x8_t __a)
++{
++ return __builtin_aarch64_cmltv8hf_uss (__a, vdupq_n_f16 (0.0f));
++}
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_f16_s16 (int16x4_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {16, 0, 18, 2, 20, 4, 22, 6, 24, 8, 26, 10, 28, 12, 30, 14});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {1, 17, 3, 19, 5, 21, 7, 23, 9, 25, 11, 27, 13, 29, 15, 31});
+-#endif
++ return __builtin_aarch64_floatv4hiv4hf (__a);
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vtrn2q_p16 (poly16x8_t __a, poly16x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_f16_s16 (int16x8_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
+-#endif
++ return __builtin_aarch64_floatv8hiv8hf (__a);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vtrn2q_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_f16_u16 (uint16x4_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {16, 0, 18, 2, 20, 4, 22, 6, 24, 8, 26, 10, 28, 12, 30, 14});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {1, 17, 3, 19, 5, 21, 7, 23, 9, 25, 11, 27, 13, 29, 15, 31});
+-#endif
++ return __builtin_aarch64_floatunsv4hiv4hf ((int16x4_t) __a);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vtrn2q_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_f16_u16 (uint16x8_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
+-#endif
++ return __builtin_aarch64_floatunsv8hiv8hf ((int16x8_t) __a);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vtrn2q_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_s16_f16 (float16x4_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 6, 2});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 5, 3, 7});
+-#endif
++ return __builtin_aarch64_lbtruncv4hfv4hi (__a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vtrn2q_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_s16_f16 (float16x8_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
+-#endif
++ return __builtin_aarch64_lbtruncv8hfv8hi (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vtrn2q_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_u16_f16 (float16x4_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {16, 0, 18, 2, 20, 4, 22, 6, 24, 8, 26, 10, 28, 12, 30, 14});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {1, 17, 3, 19, 5, 21, 7, 23, 9, 25, 11, 27, 13, 29, 15, 31});
+-#endif
++ return __builtin_aarch64_lbtruncuv4hfv4hi_us (__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vtrn2q_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_u16_f16 (float16x8_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 10, 2, 12, 4, 14, 6});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 9, 3, 11, 5, 13, 7, 15});
+-#endif
++ return __builtin_aarch64_lbtruncuv8hfv8hi_us (__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vtrn2q_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvta_s16_f16 (float16x4_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 6, 2});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 5, 3, 7});
+-#endif
++ return __builtin_aarch64_lroundv4hfv4hi (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vtrn2q_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtaq_s16_f16 (float16x8_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
+-#endif
++ return __builtin_aarch64_lroundv8hfv8hi (__a);
+ }
+
+-__extension__ static __inline float32x2x2_t __attribute__ ((__always_inline__))
+-vtrn_f32 (float32x2_t a, float32x2_t b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvta_u16_f16 (float16x4_t __a)
+ {
+- return (float32x2x2_t) {vtrn1_f32 (a, b), vtrn2_f32 (a, b)};
++ return __builtin_aarch64_lrounduv4hfv4hi_us (__a);
+ }
+
+-__extension__ static __inline poly8x8x2_t __attribute__ ((__always_inline__))
+-vtrn_p8 (poly8x8_t a, poly8x8_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtaq_u16_f16 (float16x8_t __a)
+ {
+- return (poly8x8x2_t) {vtrn1_p8 (a, b), vtrn2_p8 (a, b)};
++ return __builtin_aarch64_lrounduv8hfv8hi_us (__a);
+ }
+
+-__extension__ static __inline poly16x4x2_t __attribute__ ((__always_inline__))
+-vtrn_p16 (poly16x4_t a, poly16x4_t b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtm_s16_f16 (float16x4_t __a)
+ {
+- return (poly16x4x2_t) {vtrn1_p16 (a, b), vtrn2_p16 (a, b)};
++ return __builtin_aarch64_lfloorv4hfv4hi (__a);
+ }
+
+-__extension__ static __inline int8x8x2_t __attribute__ ((__always_inline__))
+-vtrn_s8 (int8x8_t a, int8x8_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtmq_s16_f16 (float16x8_t __a)
+ {
+- return (int8x8x2_t) {vtrn1_s8 (a, b), vtrn2_s8 (a, b)};
++ return __builtin_aarch64_lfloorv8hfv8hi (__a);
+ }
+
+-__extension__ static __inline int16x4x2_t __attribute__ ((__always_inline__))
+-vtrn_s16 (int16x4_t a, int16x4_t b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtm_u16_f16 (float16x4_t __a)
+ {
+- return (int16x4x2_t) {vtrn1_s16 (a, b), vtrn2_s16 (a, b)};
++ return __builtin_aarch64_lflooruv4hfv4hi_us (__a);
+ }
+
+-__extension__ static __inline int32x2x2_t __attribute__ ((__always_inline__))
+-vtrn_s32 (int32x2_t a, int32x2_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtmq_u16_f16 (float16x8_t __a)
+ {
+- return (int32x2x2_t) {vtrn1_s32 (a, b), vtrn2_s32 (a, b)};
++ return __builtin_aarch64_lflooruv8hfv8hi_us (__a);
+ }
+
+-__extension__ static __inline uint8x8x2_t __attribute__ ((__always_inline__))
+-vtrn_u8 (uint8x8_t a, uint8x8_t b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtn_s16_f16 (float16x4_t __a)
+ {
+- return (uint8x8x2_t) {vtrn1_u8 (a, b), vtrn2_u8 (a, b)};
++ return __builtin_aarch64_lfrintnv4hfv4hi (__a);
+ }
+
+-__extension__ static __inline uint16x4x2_t __attribute__ ((__always_inline__))
+-vtrn_u16 (uint16x4_t a, uint16x4_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtnq_s16_f16 (float16x8_t __a)
+ {
+- return (uint16x4x2_t) {vtrn1_u16 (a, b), vtrn2_u16 (a, b)};
++ return __builtin_aarch64_lfrintnv8hfv8hi (__a);
+ }
+
+-__extension__ static __inline uint32x2x2_t __attribute__ ((__always_inline__))
+-vtrn_u32 (uint32x2_t a, uint32x2_t b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtn_u16_f16 (float16x4_t __a)
+ {
+- return (uint32x2x2_t) {vtrn1_u32 (a, b), vtrn2_u32 (a, b)};
++ return __builtin_aarch64_lfrintnuv4hfv4hi_us (__a);
+ }
+
+-__extension__ static __inline float32x4x2_t __attribute__ ((__always_inline__))
+-vtrnq_f32 (float32x4_t a, float32x4_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtnq_u16_f16 (float16x8_t __a)
+ {
+- return (float32x4x2_t) {vtrn1q_f32 (a, b), vtrn2q_f32 (a, b)};
++ return __builtin_aarch64_lfrintnuv8hfv8hi_us (__a);
+ }
+
+-__extension__ static __inline poly8x16x2_t __attribute__ ((__always_inline__))
+-vtrnq_p8 (poly8x16_t a, poly8x16_t b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtp_s16_f16 (float16x4_t __a)
+ {
+- return (poly8x16x2_t) {vtrn1q_p8 (a, b), vtrn2q_p8 (a, b)};
++ return __builtin_aarch64_lceilv4hfv4hi (__a);
+ }
+
+-__extension__ static __inline poly16x8x2_t __attribute__ ((__always_inline__))
+-vtrnq_p16 (poly16x8_t a, poly16x8_t b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtpq_s16_f16 (float16x8_t __a)
+ {
+- return (poly16x8x2_t) {vtrn1q_p16 (a, b), vtrn2q_p16 (a, b)};
++ return __builtin_aarch64_lceilv8hfv8hi (__a);
+ }
+
+-__extension__ static __inline int8x16x2_t __attribute__ ((__always_inline__))
+-vtrnq_s8 (int8x16_t a, int8x16_t b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtp_u16_f16 (float16x4_t __a)
+ {
+- return (int8x16x2_t) {vtrn1q_s8 (a, b), vtrn2q_s8 (a, b)};
++ return __builtin_aarch64_lceiluv4hfv4hi_us (__a);
+ }
+
+-__extension__ static __inline int16x8x2_t __attribute__ ((__always_inline__))
+-vtrnq_s16 (int16x8_t a, int16x8_t b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtpq_u16_f16 (float16x8_t __a)
+ {
+- return (int16x8x2_t) {vtrn1q_s16 (a, b), vtrn2q_s16 (a, b)};
++ return __builtin_aarch64_lceiluv8hfv8hi_us (__a);
+ }
+
+-__extension__ static __inline int32x4x2_t __attribute__ ((__always_inline__))
+-vtrnq_s32 (int32x4_t a, int32x4_t b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vneg_f16 (float16x4_t __a)
+ {
+- return (int32x4x2_t) {vtrn1q_s32 (a, b), vtrn2q_s32 (a, b)};
++ return -__a;
+ }
+
+-__extension__ static __inline uint8x16x2_t __attribute__ ((__always_inline__))
+-vtrnq_u8 (uint8x16_t a, uint8x16_t b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vnegq_f16 (float16x8_t __a)
+ {
+- return (uint8x16x2_t) {vtrn1q_u8 (a, b), vtrn2q_u8 (a, b)};
++ return -__a;
+ }
+
+-__extension__ static __inline uint16x8x2_t __attribute__ ((__always_inline__))
+-vtrnq_u16 (uint16x8_t a, uint16x8_t b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpe_f16 (float16x4_t __a)
+ {
+- return (uint16x8x2_t) {vtrn1q_u16 (a, b), vtrn2q_u16 (a, b)};
++ return __builtin_aarch64_frecpev4hf (__a);
+ }
+
+-__extension__ static __inline uint32x4x2_t __attribute__ ((__always_inline__))
+-vtrnq_u32 (uint32x4_t a, uint32x4_t b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpeq_f16 (float16x8_t __a)
+ {
+- return (uint32x4x2_t) {vtrn1q_u32 (a, b), vtrn2q_u32 (a, b)};
++ return __builtin_aarch64_frecpev8hf (__a);
+ }
+
+-/* vtst */
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrnd_f16 (float16x4_t __a)
++{
++ return __builtin_aarch64_btruncv4hf (__a);
++}
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtst_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndq_f16 (float16x8_t __a)
+ {
+- return (uint8x8_t) ((__a & __b) != 0);
++ return __builtin_aarch64_btruncv8hf (__a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vtst_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrnda_f16 (float16x4_t __a)
+ {
+- return (uint16x4_t) ((__a & __b) != 0);
++ return __builtin_aarch64_roundv4hf (__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vtst_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndaq_f16 (float16x8_t __a)
+ {
+- return (uint32x2_t) ((__a & __b) != 0);
++ return __builtin_aarch64_roundv8hf (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vtst_s64 (int64x1_t __a, int64x1_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndi_f16 (float16x4_t __a)
+ {
+- return (uint64x1_t) ((__a & __b) != __AARCH64_INT64_C (0));
++ return __builtin_aarch64_nearbyintv4hf (__a);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vtst_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndiq_f16 (float16x8_t __a)
+ {
+- return ((__a & __b) != 0);
++ return __builtin_aarch64_nearbyintv8hf (__a);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vtst_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndm_f16 (float16x4_t __a)
+ {
+- return ((__a & __b) != 0);
++ return __builtin_aarch64_floorv4hf (__a);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vtst_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndmq_f16 (float16x8_t __a)
+ {
+- return ((__a & __b) != 0);
++ return __builtin_aarch64_floorv8hf (__a);
+ }
+
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vtst_u64 (uint64x1_t __a, uint64x1_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndn_f16 (float16x4_t __a)
+ {
+- return ((__a & __b) != __AARCH64_UINT64_C (0));
++ return __builtin_aarch64_frintnv4hf (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vtstq_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndnq_f16 (float16x8_t __a)
+ {
+- return (uint8x16_t) ((__a & __b) != 0);
++ return __builtin_aarch64_frintnv8hf (__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vtstq_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndp_f16 (float16x4_t __a)
+ {
+- return (uint16x8_t) ((__a & __b) != 0);
++ return __builtin_aarch64_ceilv4hf (__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vtstq_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndpq_f16 (float16x8_t __a)
+ {
+- return (uint32x4_t) ((__a & __b) != 0);
++ return __builtin_aarch64_ceilv8hf (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vtstq_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndx_f16 (float16x4_t __a)
+ {
+- return (uint64x2_t) ((__a & __b) != __AARCH64_INT64_C (0));
++ return __builtin_aarch64_rintv4hf (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vtstq_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrndxq_f16 (float16x8_t __a)
+ {
+- return ((__a & __b) != 0);
++ return __builtin_aarch64_rintv8hf (__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vtstq_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrte_f16 (float16x4_t a)
+ {
+- return ((__a & __b) != 0);
++ return __builtin_aarch64_rsqrtev4hf (a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vtstq_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrteq_f16 (float16x8_t a)
+ {
+- return ((__a & __b) != 0);
++ return __builtin_aarch64_rsqrtev8hf (a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vtstq_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqrt_f16 (float16x4_t a)
+ {
+- return ((__a & __b) != __AARCH64_UINT64_C (0));
++ return __builtin_aarch64_sqrtv4hf (a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vtstd_s64 (int64_t __a, int64_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsqrtq_f16 (float16x8_t a)
+ {
+- return (__a & __b) ? -1ll : 0ll;
++ return __builtin_aarch64_sqrtv8hf (a);
+ }
+
+-__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+-vtstd_u64 (uint64_t __a, uint64_t __b)
++/* ARMv8.2-A FP16 two operands vector intrinsics. */
++
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vadd_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- return (__a & __b) ? -1ll : 0ll;
++ return __a + __b;
+ }
+
+-/* vuqadd */
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vaddq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __a + __b;
++}
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vuqadd_s8 (int8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabd_f16 (float16x4_t a, float16x4_t b)
+ {
+- return __builtin_aarch64_suqaddv8qi_ssu (__a, __b);
++ return __builtin_aarch64_fabdv4hf (a, b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vuqadd_s16 (int16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vabdq_f16 (float16x8_t a, float16x8_t b)
+ {
+- return __builtin_aarch64_suqaddv4hi_ssu (__a, __b);
++ return __builtin_aarch64_fabdv8hf (a, b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vuqadd_s32 (int32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcage_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- return __builtin_aarch64_suqaddv2si_ssu (__a, __b);
++ return __builtin_aarch64_facgev4hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+-vuqadd_s64 (int64x1_t __a, uint64x1_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcageq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- return (int64x1_t) {__builtin_aarch64_suqadddi_ssu (__a[0], __b[0])};
++ return __builtin_aarch64_facgev8hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vuqaddq_s8 (int8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcagt_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- return __builtin_aarch64_suqaddv16qi_ssu (__a, __b);
++ return __builtin_aarch64_facgtv4hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vuqaddq_s16 (int16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcagtq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- return __builtin_aarch64_suqaddv8hi_ssu (__a, __b);
++ return __builtin_aarch64_facgtv8hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vuqaddq_s32 (int32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcale_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- return __builtin_aarch64_suqaddv4si_ssu (__a, __b);
++ return __builtin_aarch64_faclev4hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vuqaddq_s64 (int64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcaleq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- return __builtin_aarch64_suqaddv2di_ssu (__a, __b);
++ return __builtin_aarch64_faclev8hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int8_t __attribute__ ((__always_inline__))
+-vuqaddb_s8 (int8_t __a, uint8_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcalt_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- return __builtin_aarch64_suqaddqi_ssu (__a, __b);
++ return __builtin_aarch64_facltv4hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int16_t __attribute__ ((__always_inline__))
+-vuqaddh_s16 (int16_t __a, uint16_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcaltq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- return __builtin_aarch64_suqaddhi_ssu (__a, __b);
++ return __builtin_aarch64_facltv8hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+-vuqadds_s32 (int32_t __a, uint32_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceq_f16 (float16x4_t __a, float16x4_t __b)
+ {
+- return __builtin_aarch64_suqaddsi_ssu (__a, __b);
++ return __builtin_aarch64_cmeqv4hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int64_t __attribute__ ((__always_inline__))
+-vuqaddd_s64 (int64_t __a, uint64_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vceqq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+- return __builtin_aarch64_suqadddi_ssu (__a, __b);
++ return __builtin_aarch64_cmeqv8hf_uss (__a, __b);
+ }
+
+-#define __DEFINTERLEAVE(op, rettype, intype, funcsuffix, Q) \
+- __extension__ static __inline rettype \
+- __attribute__ ((__always_inline__)) \
+- v ## op ## Q ## _ ## funcsuffix (intype a, intype b) \
+- { \
+- return (rettype) {v ## op ## 1 ## Q ## _ ## funcsuffix (a, b), \
+- v ## op ## 2 ## Q ## _ ## funcsuffix (a, b)}; \
+- }
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcge_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_aarch64_cmgev4hf_uss (__a, __b);
++}
+
+-#define __INTERLEAVE_LIST(op) \
+- __DEFINTERLEAVE (op, float32x2x2_t, float32x2_t, f32,) \
+- __DEFINTERLEAVE (op, poly8x8x2_t, poly8x8_t, p8,) \
+- __DEFINTERLEAVE (op, poly16x4x2_t, poly16x4_t, p16,) \
+- __DEFINTERLEAVE (op, int8x8x2_t, int8x8_t, s8,) \
+- __DEFINTERLEAVE (op, int16x4x2_t, int16x4_t, s16,) \
+- __DEFINTERLEAVE (op, int32x2x2_t, int32x2_t, s32,) \
+- __DEFINTERLEAVE (op, uint8x8x2_t, uint8x8_t, u8,) \
+- __DEFINTERLEAVE (op, uint16x4x2_t, uint16x4_t, u16,) \
+- __DEFINTERLEAVE (op, uint32x2x2_t, uint32x2_t, u32,) \
+- __DEFINTERLEAVE (op, float32x4x2_t, float32x4_t, f32, q) \
+- __DEFINTERLEAVE (op, poly8x16x2_t, poly8x16_t, p8, q) \
+- __DEFINTERLEAVE (op, poly16x8x2_t, poly16x8_t, p16, q) \
+- __DEFINTERLEAVE (op, int8x16x2_t, int8x16_t, s8, q) \
+- __DEFINTERLEAVE (op, int16x8x2_t, int16x8_t, s16, q) \
+- __DEFINTERLEAVE (op, int32x4x2_t, int32x4_t, s32, q) \
+- __DEFINTERLEAVE (op, uint8x16x2_t, uint8x16_t, u8, q) \
+- __DEFINTERLEAVE (op, uint16x8x2_t, uint16x8_t, u16, q) \
+- __DEFINTERLEAVE (op, uint32x4x2_t, uint32x4_t, u32, q)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgeq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_aarch64_cmgev8hf_uss (__a, __b);
++}
+
+-/* vuzp */
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgt_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_aarch64_cmgtv4hf_uss (__a, __b);
++}
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vuzp1_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcgtq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
+-#endif
++ return __builtin_aarch64_cmgtv8hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vuzp1_p8 (poly8x8_t __a, poly8x8_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcle_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
+-#endif
++ return __builtin_aarch64_cmlev4hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vuzp1_p16 (poly16x4_t __a, poly16x4_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcleq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 7, 1, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 2, 4, 6});
+-#endif
++ return __builtin_aarch64_cmlev8hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vuzp1_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vclt_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
+-#endif
++ return __builtin_aarch64_cmltv4hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vuzp1_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcltq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 7, 1, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 2, 4, 6});
+-#endif
++ return __builtin_aarch64_cmltv8hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vuzp1_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_f16_s16 (int16x4_t __a, const int __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
+-#endif
++ return __builtin_aarch64_scvtfv4hi (__a, __b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vuzp1_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_f16_s16 (int16x8_t __a, const int __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
+-#endif
++ return __builtin_aarch64_scvtfv8hi (__a, __b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vuzp1_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_f16_u16 (uint16x4_t __a, const int __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {5, 7, 1, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 2, 4, 6});
+-#endif
++ return __builtin_aarch64_ucvtfv4hi_sus (__a, __b);
++}
++
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_f16_u16 (uint16x8_t __a, const int __b)
++{
++ return __builtin_aarch64_ucvtfv8hi_sus (__a, __b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vuzp1_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline int16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_s16_f16 (float16x4_t __a, const int __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
+-#endif
++ return __builtin_aarch64_fcvtzsv4hf (__a, __b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vuzp1q_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline int16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_s16_f16 (float16x8_t __a, const int __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 7, 1, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 2, 4, 6});
+-#endif
++ return __builtin_aarch64_fcvtzsv8hf (__a, __b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vuzp1q_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline uint16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvt_n_u16_f16 (float16x4_t __a, const int __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
+-#endif
++ return __builtin_aarch64_fcvtzuv4hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vuzp1q_p8 (poly8x16_t __a, poly8x16_t __b)
++__extension__ extern __inline uint16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vcvtq_n_u16_f16 (float16x8_t __a, const int __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30});
+-#endif
++ return __builtin_aarch64_fcvtzuv8hf_uss (__a, __b);
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vuzp1q_p16 (poly16x8_t __a, poly16x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdiv_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
+-#endif
++ return __a / __b;
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vuzp1q_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vdivq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30});
+-#endif
++ return __a / __b;
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vuzp1q_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmax_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
+-#endif
++ return __builtin_aarch64_smax_nanv4hf (__a, __b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vuzp1q_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 7, 1, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 2, 4, 6});
+-#endif
++ return __builtin_aarch64_smax_nanv8hf (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vuzp1q_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnm_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
+-#endif
++ return __builtin_aarch64_fmaxv4hf (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vuzp1q_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnmq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30});
+-#endif
++ return __builtin_aarch64_fmaxv8hf (__a, __b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vuzp1q_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmin_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {9, 11, 13, 15, 1, 3, 5, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 2, 4, 6, 8, 10, 12, 14});
+-#endif
++ return __builtin_aarch64_smin_nanv4hf (__a, __b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vuzp1q_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {5, 7, 1, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 2, 4, 6});
+-#endif
++ return __builtin_aarch64_smin_nanv8hf (__a, __b);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vuzp1q_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnm_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
+-#endif
++ return __builtin_aarch64_fminv4hf (__a, __b);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vuzp2_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnmq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
+-#endif
++ return __builtin_aarch64_fminv8hf (__a, __b);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vuzp2_p8 (poly8x8_t __a, poly8x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
+-#endif
++ return __a * __b;
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vuzp2_p16 (poly16x4_t __a, poly16x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 6, 0, 2});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 3, 5, 7});
+-#endif
++ return __a * __b;
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vuzp2_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulx_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
+-#endif
++ return __builtin_aarch64_fmulxv4hf (__a, __b);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vuzp2_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 6, 0, 2});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 3, 5, 7});
+-#endif
++ return __builtin_aarch64_fmulxv8hf (__a, __b);
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vuzp2_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpadd_f16 (float16x4_t a, float16x4_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
+-#endif
++ return __builtin_aarch64_faddpv4hf (a, b);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vuzp2_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpaddq_f16 (float16x8_t a, float16x8_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
+-#endif
++ return __builtin_aarch64_faddpv8hf (a, b);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vuzp2_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmax_f16 (float16x4_t a, float16x4_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 6, 0, 2});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {1, 3, 5, 7});
+-#endif
++ return __builtin_aarch64_smax_nanpv4hf (a, b);
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vuzp2_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxq_f16 (float16x8_t a, float16x8_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
+-#endif
++ return __builtin_aarch64_smax_nanpv8hf (a, b);
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vuzp2q_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxnm_f16 (float16x4_t a, float16x4_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 6, 0, 2});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 3, 5, 7});
+-#endif
++ return __builtin_aarch64_smaxpv4hf (a, b);
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vuzp2q_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmaxnmq_f16 (float16x8_t a, float16x8_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
+-#endif
++ return __builtin_aarch64_smaxpv8hf (a, b);
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vuzp2q_p8 (poly8x16_t __a, poly8x16_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpmin_f16 (float16x4_t a, float16x4_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31});
+-#endif
++ return __builtin_aarch64_smin_nanpv4hf (a, b);
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vuzp2q_p16 (poly16x8_t __a, poly16x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminq_f16 (float16x8_t a, float16x8_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
+-#endif
++ return __builtin_aarch64_smin_nanpv8hf (a, b);
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vuzp2q_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminnm_f16 (float16x4_t a, float16x4_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14});
+-#else
+- return __builtin_shuffle (__a, __b,
+- (uint8x16_t) {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31});
+-#endif
++ return __builtin_aarch64_sminpv4hf (a, b);
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vuzp2q_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vpminnmq_f16 (float16x8_t a, float16x8_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
+-#endif
++ return __builtin_aarch64_sminpv8hf (a, b);
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vuzp2q_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecps_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 6, 0, 2});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 3, 5, 7});
+-#endif
++ return __builtin_aarch64_frecpsv4hf (__a, __b);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vuzp2q_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrecpsq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
+-#endif
++ return __builtin_aarch64_frecpsv8hf (__a, __b);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vuzp2q_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrts_f16 (float16x4_t a, float16x4_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31});
+-#endif
++ return __builtin_aarch64_rsqrtsv4hf (a, b);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vuzp2q_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vrsqrtsq_f16 (float16x8_t a, float16x8_t b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 10, 12, 14, 0, 2, 4, 6});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {1, 3, 5, 7, 9, 11, 13, 15});
+-#endif
++ return __builtin_aarch64_rsqrtsv8hf (a, b);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vuzp2q_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsub_f16 (float16x4_t __a, float16x4_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 6, 0, 2});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {1, 3, 5, 7});
+-#endif
++ return __a - __b;
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vuzp2q_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vsubq_f16 (float16x8_t __a, float16x8_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
+-#endif
++ return __a - __b;
+ }
+
+-__INTERLEAVE_LIST (uzp)
+-
+-/* vzip */
++/* ARMv8.2-A FP16 three operands vector intrinsics. */
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vzip1_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
+-#endif
++ return __builtin_aarch64_fmav4hf (__b, __c, __a);
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vzip1_p8 (poly8x8_t __a, poly8x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {12, 4, 13, 5, 14, 6, 15, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+-#endif
++ return __builtin_aarch64_fmav8hf (__b, __c, __a);
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vzip1_p16 (poly16x4_t __a, poly16x4_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {6, 2, 7, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 1, 5});
+-#endif
++ return __builtin_aarch64_fnmav4hf (__b, __c, __a);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vzip1_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {12, 4, 13, 5, 14, 6, 15, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+-#endif
++ return __builtin_aarch64_fnmav8hf (__b, __c, __a);
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vzip1_s16 (int16x4_t __a, int16x4_t __b)
+-{
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {6, 2, 7, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 1, 5});
+-#endif
++/* ARMv8.2-A FP16 lane vector intrinsics. */
++
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmah_lane_f16 (float16_t __a, float16_t __b,
++ float16x4_t __c, const int __lane)
++{
++ return vfmah_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vzip1_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmah_laneq_f16 (float16_t __a, float16_t __b,
++ float16x8_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
+-#endif
++ return vfmah_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vzip1_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_lane_f16 (float16x4_t __a, float16x4_t __b,
++ float16x4_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {12, 4, 13, 5, 14, 6, 15, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+-#endif
++ return vfma_f16 (__a, __b, __aarch64_vdup_lane_f16 (__c, __lane));
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vzip1_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_lane_f16 (float16x8_t __a, float16x8_t __b,
++ float16x4_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {6, 2, 7, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {0, 4, 1, 5});
+-#endif
++ return vfmaq_f16 (__a, __b, __aarch64_vdupq_lane_f16 (__c, __lane));
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vzip1_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_laneq_f16 (float16x4_t __a, float16x4_t __b,
++ float16x8_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {0, 2});
+-#endif
++ return vfma_f16 (__a, __b, __aarch64_vdup_laneq_f16 (__c, __lane));
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vzip1q_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_laneq_f16 (float16x8_t __a, float16x8_t __b,
++ float16x8_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {6, 2, 7, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 1, 5});
+-#endif
++ return vfmaq_f16 (__a, __b, __aarch64_vdupq_laneq_f16 (__c, __lane));
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vzip1q_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfma_n_f16 (float16x4_t __a, float16x4_t __b, float16_t __c)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
+-#endif
++ return vfma_f16 (__a, __b, vdup_n_f16 (__c));
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vzip1q_p8 (poly8x16_t __a, poly8x16_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmaq_n_f16 (float16x8_t __a, float16x8_t __b, float16_t __c)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23});
+-#endif
++ return vfmaq_f16 (__a, __b, vdupq_n_f16 (__c));
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vzip1q_p16 (poly16x8_t __a, poly16x8_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsh_lane_f16 (float16_t __a, float16_t __b,
++ float16x4_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t)
+- {12, 4, 13, 5, 14, 6, 15, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+-#endif
++ return vfmsh_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vzip1q_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsh_laneq_f16 (float16_t __a, float16_t __b,
++ float16x8_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23});
+-#endif
++ return vfmsh_f16 (__a, __b, __aarch64_vget_lane_any (__c, __lane));
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vzip1q_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_lane_f16 (float16x4_t __a, float16x4_t __b,
++ float16x4_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t)
+- {12, 4, 13, 5, 14, 6, 15, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+-#endif
++ return vfms_f16 (__a, __b, __aarch64_vdup_lane_f16 (__c, __lane));
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vzip1q_s32 (int32x4_t __a, int32x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_lane_f16 (float16x8_t __a, float16x8_t __b,
++ float16x4_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {6, 2, 7, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 1, 5});
+-#endif
++ return vfmsq_f16 (__a, __b, __aarch64_vdupq_lane_f16 (__c, __lane));
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vzip1q_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_laneq_f16 (float16x4_t __a, float16x4_t __b,
++ float16x8_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
+-#endif
++ return vfms_f16 (__a, __b, __aarch64_vdup_laneq_f16 (__c, __lane));
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vzip1q_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_laneq_f16 (float16x8_t __a, float16x8_t __b,
++ float16x8_t __c, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {24, 8, 25, 9, 26, 10, 27, 11, 28, 12, 29, 13, 30, 14, 31, 15});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23});
+-#endif
++ return vfmsq_f16 (__a, __b, __aarch64_vdupq_laneq_f16 (__c, __lane));
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vzip1q_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfms_n_f16 (float16x4_t __a, float16x4_t __b, float16_t __c)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t)
+- {12, 4, 13, 5, 14, 6, 15, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {0, 8, 1, 9, 2, 10, 3, 11});
+-#endif
++ return vfms_f16 (__a, __b, vdup_n_f16 (__c));
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vzip1q_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vfmsq_n_f16 (float16x8_t __a, float16x8_t __b, float16_t __c)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {6, 2, 7, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {0, 4, 1, 5});
+-#endif
++ return vfmsq_f16 (__a, __b, vdupq_n_f16 (__c));
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vzip1q_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulh_lane_f16 (float16_t __a, float16x4_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {3, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {0, 2});
+-#endif
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+-vzip2_f32 (float32x2_t __a, float32x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_lane_f16 (float16x4_t __a, float16x4_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
+-#endif
++ return vmul_f16 (__a, vdup_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+ }
+
+-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+-vzip2_p8 (poly8x8_t __a, poly8x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {4, 12, 5, 13, 6, 14, 7, 15});
+-#endif
++ return vmulq_f16 (__a, vdupq_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+ }
+
+-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+-vzip2_p16 (poly16x4_t __a, poly16x4_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulh_laneq_f16 (float16_t __a, float16x8_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 5, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {2, 6, 3, 7});
+-#endif
++ return __a * __aarch64_vget_lane_any (__b, __lane);
+ }
+
+-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+-vzip2_s8 (int8x8_t __a, int8x8_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_laneq_f16 (float16x4_t __a, float16x8_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {4, 12, 5, 13, 6, 14, 7, 15});
+-#endif
++ return vmul_f16 (__a, vdup_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+ }
+
+-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+-vzip2_s16 (int16x4_t __a, int16x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_laneq_f16 (float16x8_t __a, float16x8_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 5, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {2, 6, 3, 7});
+-#endif
++ return vmulq_f16 (__a, vdupq_n_f16 (__aarch64_vget_lane_any (__b, __lane)));
+ }
+
+-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+-vzip2_s32 (int32x2_t __a, int32x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmul_n_f16 (float16x4_t __a, float16_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
+-#endif
++ return vmul_lane_f16 (__a, vdup_n_f16 (__b), 0);
+ }
+
+-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+-vzip2_u8 (uint8x8_t __a, uint8x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulq_n_f16 (float16x8_t __a, float16_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x8_t) {4, 12, 5, 13, 6, 14, 7, 15});
+-#endif
++ return vmulq_laneq_f16 (__a, vdupq_n_f16 (__b), 0);
+ }
+
+-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+-vzip2_u16 (uint16x4_t __a, uint16x4_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxh_lane_f16 (float16_t __a, float16x4_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {4, 0, 5, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x4_t) {2, 6, 3, 7});
+-#endif
++ return vmulxh_f16 (__a, __aarch64_vget_lane_any (__b, __lane));
+ }
+
+-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+-vzip2_u32 (uint32x2_t __a, uint32x2_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulx_lane_f16 (float16x4_t __a, float16x4_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x2_t) {1, 3});
+-#endif
++ return vmulx_f16 (__a, __aarch64_vdup_lane_f16 (__b, __lane));
+ }
+
+-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+-vzip2q_f32 (float32x4_t __a, float32x4_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 5, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {2, 6, 3, 7});
+-#endif
++ return vmulxq_f16 (__a, __aarch64_vdupq_lane_f16 (__b, __lane));
+ }
+
+-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+-vzip2q_f64 (float64x2_t __a, float64x2_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxh_laneq_f16 (float16_t __a, float16x8_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
+-#endif
++ return vmulxh_f16 (__a, __aarch64_vget_lane_any (__b, __lane));
+ }
+
+-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+-vzip2q_p8 (poly8x16_t __a, poly8x16_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulx_laneq_f16 (float16x4_t __a, float16x8_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31});
+-#endif
++ return vmulx_f16 (__a, __aarch64_vdup_laneq_f16 (__b, __lane));
+ }
+
+-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+-vzip2q_p16 (poly16x8_t __a, poly16x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxq_laneq_f16 (float16x8_t __a, float16x8_t __b, const int __lane)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t)
+- {4, 12, 5, 13, 6, 14, 7, 15});
+-#endif
++ return vmulxq_f16 (__a, __aarch64_vdupq_laneq_f16 (__b, __lane));
+ }
+
+-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+-vzip2q_s8 (int8x16_t __a, int8x16_t __b)
++__extension__ extern __inline float16x4_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulx_n_f16 (float16x4_t __a, float16_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31});
+-#endif
++ return vmulx_f16 (__a, vdup_n_f16 (__b));
+ }
+
+-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+-vzip2q_s16 (int16x8_t __a, int16x8_t __b)
++__extension__ extern __inline float16x8_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmulxq_n_f16 (float16x8_t __a, float16_t __b)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t)
+- {4, 12, 5, 13, 6, 14, 7, 15});
+-#endif
++ return vmulxq_f16 (__a, vdupq_n_f16 (__b));
+ }
+
+-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+-vzip2q_s32 (int32x4_t __a, int32x4_t __b)
++/* ARMv8.2-A FP16 reduction vector intrinsics. */
++
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxv_f16 (float16x4_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 5, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {2, 6, 3, 7});
+-#endif
++ return __builtin_aarch64_reduc_smax_nan_scal_v4hf (__a);
+ }
+
+-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+-vzip2q_s64 (int64x2_t __a, int64x2_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxvq_f16 (float16x8_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
+-#endif
++ return __builtin_aarch64_reduc_smax_nan_scal_v8hf (__a);
+ }
+
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vzip2q_u8 (uint8x16_t __a, uint8x16_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminv_f16 (float16x4_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {16, 0, 17, 1, 18, 2, 19, 3, 20, 4, 21, 5, 22, 6, 23, 7});
+-#else
+- return __builtin_shuffle (__a, __b, (uint8x16_t)
+- {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31});
+-#endif
++ return __builtin_aarch64_reduc_smin_nan_scal_v4hf (__a);
+ }
+
+-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+-vzip2q_u16 (uint16x8_t __a, uint16x8_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminvq_f16 (float16x8_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint16x8_t) {8, 0, 9, 1, 10, 2, 11, 3});
+-#else
+- return __builtin_shuffle (__a, __b, (uint16x8_t)
+- {4, 12, 5, 13, 6, 14, 7, 15});
+-#endif
++ return __builtin_aarch64_reduc_smin_nan_scal_v8hf (__a);
+ }
+
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vzip2q_u32 (uint32x4_t __a, uint32x4_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnmv_f16 (float16x4_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {4, 0, 5, 1});
+-#else
+- return __builtin_shuffle (__a, __b, (uint32x4_t) {2, 6, 3, 7});
+-#endif
++ return __builtin_aarch64_reduc_smax_scal_v4hf (__a);
+ }
+
+-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+-vzip2q_u64 (uint64x2_t __a, uint64x2_t __b)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vmaxnmvq_f16 (float16x8_t __a)
+ {
+-#ifdef __AARCH64EB__
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {2, 0});
+-#else
+- return __builtin_shuffle (__a, __b, (uint64x2_t) {1, 3});
+-#endif
++ return __builtin_aarch64_reduc_smax_scal_v8hf (__a);
+ }
+
+-__INTERLEAVE_LIST (zip)
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnmv_f16 (float16x4_t __a)
++{
++ return __builtin_aarch64_reduc_smin_scal_v4hf (__a);
++}
+
+-#undef __INTERLEAVE_LIST
+-#undef __DEFINTERLEAVE
++__extension__ extern __inline float16_t
++__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
++vminnmvq_f16 (float16x8_t __a)
++{
++ return __builtin_aarch64_reduc_smin_scal_v8hf (__a);
++}
+
+-/* End of optimal implementations in approved order. */
++#pragma GCC pop_options
+
+ #undef __aarch64_vget_lane_any
+
+ #undef __aarch64_vdup_lane_any
++#undef __aarch64_vdup_lane_f16
+ #undef __aarch64_vdup_lane_f32
+ #undef __aarch64_vdup_lane_f64
+ #undef __aarch64_vdup_lane_p8
+@@ -25780,6 +30399,7 @@ __INTERLEAVE_LIST (zip)
+ #undef __aarch64_vdup_lane_u16
+ #undef __aarch64_vdup_lane_u32
+ #undef __aarch64_vdup_lane_u64
++#undef __aarch64_vdup_laneq_f16
+ #undef __aarch64_vdup_laneq_f32
+ #undef __aarch64_vdup_laneq_f64
+ #undef __aarch64_vdup_laneq_p8
+@@ -25792,6 +30412,7 @@ __INTERLEAVE_LIST (zip)
+ #undef __aarch64_vdup_laneq_u16
+ #undef __aarch64_vdup_laneq_u32
+ #undef __aarch64_vdup_laneq_u64
++#undef __aarch64_vdupq_lane_f16
+ #undef __aarch64_vdupq_lane_f32
+ #undef __aarch64_vdupq_lane_f64
+ #undef __aarch64_vdupq_lane_p8
+@@ -25804,6 +30425,7 @@ __INTERLEAVE_LIST (zip)
+ #undef __aarch64_vdupq_lane_u16
+ #undef __aarch64_vdupq_lane_u32
+ #undef __aarch64_vdupq_lane_u64
++#undef __aarch64_vdupq_laneq_f16
+ #undef __aarch64_vdupq_laneq_f32
+ #undef __aarch64_vdupq_laneq_f64
+ #undef __aarch64_vdupq_laneq_p8
+@@ -25817,6 +30439,4 @@ __INTERLEAVE_LIST (zip)
+ #undef __aarch64_vdupq_laneq_u32
+ #undef __aarch64_vdupq_laneq_u64
+
+-#pragma GCC pop_options
+-
+ #endif
+--- a/src/gcc/config/aarch64/atomics.md
++++ b/src/gcc/config/aarch64/atomics.md
+@@ -583,7 +583,7 @@
+ }
+ )
+
+-;; ARMv8.1 LSE instructions.
++;; ARMv8.1-A LSE instructions.
+
+ ;; Atomic swap with memory.
+ (define_insn "aarch64_atomic_swp<mode>"
+--- a/src/gcc/config/aarch64/geniterators.sh
++++ b/src/gcc/config/aarch64/geniterators.sh
+@@ -23,10 +23,7 @@
+ # BUILTIN_<ITERATOR> macros, which expand to VAR<N> Macros covering the
+ # same set of modes as the iterator in iterators.md
+ #
+-# Find the <ITERATOR> definitions (may span several lines), skip the ones
+-# which does not have a simple format because it contains characters we
+-# don't want to or can't handle (e.g P, PTR iterators change depending on
+-# Pmode and ptr_mode).
++# Find the <ITERATOR> definitions (may span several lines).
+ LC_ALL=C awk '
+ BEGIN {
+ print "/* -*- buffer-read-only: t -*- */"
+@@ -49,12 +46,24 @@ iterdef {
+ sub(/.*\(define_mode_iterator/, "", s)
+ }
+
+-iterdef && s ~ /\)/ {
++iterdef {
++ # Count the parentheses, the iterator definition ends
++ # if there are more closing ones than opening ones.
++ nopen = gsub(/\(/, "(", s)
++ nclose = gsub(/\)/, ")", s)
++ if (nopen >= nclose)
++ next
++
+ iterdef = 0
+
+ gsub(/[ \t]+/, " ", s)
+- sub(/ *\).*/, "", s)
++ sub(/ *\)[^)]*$/, "", s)
+ sub(/^ /, "", s)
++
++ # Drop the conditions.
++ gsub(/ *"[^"]*" *\)/, "", s)
++ gsub(/\( */, "", s)
++
+ if (s !~ /^[A-Za-z0-9_]+ \[[A-Z0-9 ]*\]$/)
+ next
+ sub(/\[ */, "", s)
+--- a/src/gcc/config/aarch64/iterators.md
++++ b/src/gcc/config/aarch64/iterators.md
+@@ -26,6 +26,9 @@
+ ;; Iterator for General Purpose Integer registers (32- and 64-bit modes)
+ (define_mode_iterator GPI [SI DI])
+
++;; Iterator for HI, SI, DI, some instructions can only work on these modes.
++(define_mode_iterator GPI_I16 [(HI "AARCH64_ISA_F16") SI DI])
++
+ ;; Iterator for QI and HI modes
+ (define_mode_iterator SHORT [QI HI])
+
+@@ -38,6 +41,9 @@
+ ;; Iterator for General Purpose Floating-point registers (32- and 64-bit modes)
+ (define_mode_iterator GPF [SF DF])
+
++;; Iterator for all scalar floating point modes (HF, SF, DF)
++(define_mode_iterator GPF_F16 [(HF "AARCH64_ISA_F16") SF DF])
++
+ ;; Iterator for all scalar floating point modes (HF, SF, DF and TF)
+ (define_mode_iterator GPF_TF_F16 [HF SF DF TF])
+
+@@ -88,11 +94,22 @@
+ ;; Vector Float modes suitable for moving, loading and storing.
+ (define_mode_iterator VDQF_F16 [V4HF V8HF V2SF V4SF V2DF])
+
+-;; Vector Float modes, barring HF modes.
++;; Vector Float modes.
+ (define_mode_iterator VDQF [V2SF V4SF V2DF])
++(define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST")
++ (V8HF "TARGET_SIMD_F16INST")
++ V2SF V4SF V2DF])
+
+ ;; Vector Float modes, and DF.
+ (define_mode_iterator VDQF_DF [V2SF V4SF V2DF DF])
++(define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST")
++ (V8HF "TARGET_SIMD_F16INST")
++ V2SF V4SF V2DF DF])
++(define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD_F16INST")
++ (V8HF "TARGET_SIMD_F16INST")
++ V2SF V4SF V2DF
++ (HF "TARGET_SIMD_F16INST")
++ SF DF])
+
+ ;; Vector single Float modes.
+ (define_mode_iterator VDQSF [V2SF V4SF])
+@@ -150,10 +167,30 @@
+
+ ;; Vector modes except double int.
+ (define_mode_iterator VDQIF [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
++(define_mode_iterator VDQIF_F16 [V8QI V16QI V4HI V8HI V2SI V4SI
++ V4HF V8HF V2SF V4SF V2DF])
+
+ ;; Vector modes for S type.
+ (define_mode_iterator VDQ_SI [V2SI V4SI])
+
++;; Vector modes for S and D
++(define_mode_iterator VDQ_SDI [V2SI V4SI V2DI])
++
++;; Vector modes for H, S and D
++(define_mode_iterator VDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
++ (V8HI "TARGET_SIMD_F16INST")
++ V2SI V4SI V2DI])
++
++;; Scalar and Vector modes for S and D
++(define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI])
++
++;; Scalar and Vector modes for S and D, Vector modes for H.
++(define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
++ (V8HI "TARGET_SIMD_F16INST")
++ V2SI V4SI V2DI
++ (HI "TARGET_SIMD_F16INST")
++ SI DI])
++
+ ;; Vector modes for Q and H types.
+ (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
+
+@@ -193,7 +230,10 @@
+ (define_mode_iterator DX [DI DF])
+
+ ;; Modes available for <f>mul lane operations.
+-(define_mode_iterator VMUL [V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
++(define_mode_iterator VMUL [V4HI V8HI V2SI V4SI
++ (V4HF "TARGET_SIMD_F16INST")
++ (V8HF "TARGET_SIMD_F16INST")
++ V2SF V4SF V2DF])
+
+ ;; Modes available for <f>mul lane operations changing lane count.
+ (define_mode_iterator VMUL_CHANGE_NLANES [V4HI V8HI V2SI V4SI V2SF V4SF])
+@@ -342,8 +382,8 @@
+ (define_mode_attr w [(QI "w") (HI "w") (SI "w") (DI "x") (SF "s") (DF "d")])
+
+ ;; For inequal width int to float conversion
+-(define_mode_attr w1 [(SF "w") (DF "x")])
+-(define_mode_attr w2 [(SF "x") (DF "w")])
++(define_mode_attr w1 [(HF "w") (SF "w") (DF "x")])
++(define_mode_attr w2 [(HF "x") (SF "x") (DF "w")])
+
+ (define_mode_attr short_mask [(HI "65535") (QI "255")])
+
+@@ -355,12 +395,13 @@
+
+ ;; For scalar usage of vector/FP registers
+ (define_mode_attr v [(QI "b") (HI "h") (SI "s") (DI "d")
+- (SF "s") (DF "d")
++ (HF "h") (SF "s") (DF "d")
+ (V8QI "") (V16QI "")
+ (V4HI "") (V8HI "")
+ (V2SI "") (V4SI "")
+ (V2DI "") (V2SF "")
+- (V4SF "") (V2DF "")])
++ (V4SF "") (V4HF "")
++ (V8HF "") (V2DF "")])
+
+ ;; For scalar usage of vector/FP registers, narrowing
+ (define_mode_attr vn2 [(QI "") (HI "b") (SI "h") (DI "s")
+@@ -385,7 +426,7 @@
+ (define_mode_attr vas [(DI "") (SI ".2s")])
+
+ ;; Map a floating point mode to the appropriate register name prefix
+-(define_mode_attr s [(SF "s") (DF "d")])
++(define_mode_attr s [(HF "h") (SF "s") (DF "d")])
+
+ ;; Give the length suffix letter for a sign- or zero-extension.
+ (define_mode_attr size [(QI "b") (HI "h") (SI "w")])
+@@ -421,8 +462,8 @@
+ (V4SF ".4s") (V2DF ".2d")
+ (DI "") (SI "")
+ (HI "") (QI "")
+- (TI "") (SF "")
+- (DF "")])
++ (TI "") (HF "")
++ (SF "") (DF "")])
+
+ ;; Register suffix narrowed modes for VQN.
+ (define_mode_attr Vmntype [(V8HI ".8b") (V4SI ".4h")
+@@ -437,10 +478,21 @@
+ (V2DI "d") (V4HF "h")
+ (V8HF "h") (V2SF "s")
+ (V4SF "s") (V2DF "d")
++ (HF "h")
+ (SF "s") (DF "d")
+ (QI "b") (HI "h")
+ (SI "s") (DI "d")])
+
++;; Vetype is used everywhere in scheduling type and assembly output,
++;; sometimes they are not the same, for example HF modes on some
++;; instructions. stype is defined to represent scheduling type
++;; more accurately.
++(define_mode_attr stype [(V8QI "b") (V16QI "b") (V4HI "s") (V8HI "s")
++ (V2SI "s") (V4SI "s") (V2DI "d") (V4HF "s")
++ (V8HF "s") (V2SF "s") (V4SF "s") (V2DF "d")
++ (HF "s") (SF "s") (DF "d") (QI "b") (HI "s")
++ (SI "s") (DI "d")])
++
+ ;; Mode-to-bitwise operation type mapping.
+ (define_mode_attr Vbtype [(V8QI "8b") (V16QI "16b")
+ (V4HI "8b") (V8HI "16b")
+@@ -598,7 +650,7 @@
+ (V4HF "V4HI") (V8HF "V8HI")
+ (V2SF "V2SI") (V4SF "V4SI")
+ (V2DF "V2DI") (DF "DI")
+- (SF "SI")])
++ (SF "SI") (HF "HI")])
+
+ ;; Lower case mode of results of comparison operations.
+ (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
+@@ -648,12 +700,21 @@
+ (define_mode_attr atomic_sfx
+ [(QI "b") (HI "h") (SI "") (DI "")])
+
+-(define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si") (SF "si") (DF "di")])
+-(define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI") (SF "SI") (DF "DI")])
++(define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si")
++ (V2DI "v2df") (V4SI "v4sf") (V2SI "v2sf")
++ (SF "si") (DF "di") (SI "sf") (DI "df")
++ (V4HF "v4hi") (V8HF "v8hi") (V4HI "v4hf")
++ (V8HI "v8hf") (HF "hi") (HI "hf")])
++(define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI")
++ (V2DI "V2DF") (V4SI "V4SF") (V2SI "V2SF")
++ (SF "SI") (DF "DI") (SI "SF") (DI "DF")
++ (V4HF "V4HI") (V8HF "V8HI") (V4HI "V4HF")
++ (V8HI "V8HF") (HF "HI") (HI "HF")])
++
+
+ ;; for the inequal width integer to fp conversions
+-(define_mode_attr fcvt_iesize [(SF "di") (DF "si")])
+-(define_mode_attr FCVT_IESIZE [(SF "DI") (DF "SI")])
++(define_mode_attr fcvt_iesize [(HF "di") (SF "di") (DF "si")])
++(define_mode_attr FCVT_IESIZE [(HF "DI") (SF "DI") (DF "SI")])
+
+ (define_mode_attr VSWAP_WIDTH [(V8QI "V16QI") (V16QI "V8QI")
+ (V4HI "V8HI") (V8HI "V4HI")
+@@ -676,6 +737,7 @@
+ ;; the 'x' constraint. All other modes may use the 'w' constraint.
+ (define_mode_attr h_con [(V2SI "w") (V4SI "w")
+ (V4HI "x") (V8HI "x")
++ (V4HF "w") (V8HF "w")
+ (V2SF "w") (V4SF "w")
+ (V2DF "w") (DF "w")])
+
+@@ -684,6 +746,7 @@
+ (V4HI "") (V8HI "")
+ (V2SI "") (V4SI "")
+ (DI "") (V2DI "")
++ (V4HF "f") (V8HF "f")
+ (V2SF "f") (V4SF "f")
+ (V2DF "f") (DF "f")])
+
+@@ -692,6 +755,7 @@
+ (V4HI "") (V8HI "")
+ (V2SI "") (V4SI "")
+ (DI "") (V2DI "")
++ (V4HF "_fp") (V8HF "_fp")
+ (V2SF "_fp") (V4SF "_fp")
+ (V2DF "_fp") (DF "_fp")
+ (SF "_fp")])
+@@ -704,17 +768,19 @@
+ (V4HF "") (V8HF "_q")
+ (V2SF "") (V4SF "_q")
+ (V2DF "_q")
+- (QI "") (HI "") (SI "") (DI "") (SF "") (DF "")])
++ (QI "") (HI "") (SI "") (DI "") (HF "") (SF "") (DF "")])
+
+ (define_mode_attr vp [(V8QI "v") (V16QI "v")
+ (V4HI "v") (V8HI "v")
+ (V2SI "p") (V4SI "v")
+- (V2DI "p") (V2DF "p")
+- (V2SF "p") (V4SF "v")])
++ (V2DI "p") (V2DF "p")
++ (V2SF "p") (V4SF "v")
++ (V4HF "v") (V8HF "v")])
+
+ (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
+ (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
+
++;; Sum of lengths of instructions needed to move vector registers of a mode.
+ (define_mode_attr insn_count [(OI "8") (CI "12") (XI "16")])
+
+ ;; -fpic small model GOT reloc modifers: gotpage_lo15/lo14 for ILP64/32.
+@@ -876,9 +942,6 @@
+ ;; Similar, but when not(op)
+ (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")])
+
+-;; Sign- or zero-extending load
+-(define_code_attr ldrxt [(sign_extend "ldrs") (zero_extend "ldr")])
+-
+ ;; Sign- or zero-extending data-op
+ (define_code_attr su [(sign_extend "s") (zero_extend "u")
+ (sign_extract "s") (zero_extract "u")
+@@ -953,9 +1016,8 @@
+ (define_int_iterator ADDSUBHN2 [UNSPEC_ADDHN2 UNSPEC_RADDHN2
+ UNSPEC_SUBHN2 UNSPEC_RSUBHN2])
+
+-(define_int_iterator FMAXMIN_UNS [UNSPEC_FMAX UNSPEC_FMIN])
+-
+-(define_int_iterator FMAXMIN [UNSPEC_FMAXNM UNSPEC_FMINNM])
++(define_int_iterator FMAXMIN_UNS [UNSPEC_FMAX UNSPEC_FMIN
++ UNSPEC_FMAXNM UNSPEC_FMINNM])
+
+ (define_int_iterator VQDMULH [UNSPEC_SQDMULH UNSPEC_SQRDMULH])
+
+@@ -1001,6 +1063,9 @@
+ (define_int_iterator FCVT [UNSPEC_FRINTZ UNSPEC_FRINTP UNSPEC_FRINTM
+ UNSPEC_FRINTA UNSPEC_FRINTN])
+
++(define_int_iterator FCVT_F2FIXED [UNSPEC_FCVTZS UNSPEC_FCVTZU])
++(define_int_iterator FCVT_FIXED2F [UNSPEC_SCVTF UNSPEC_UCVTF])
++
+ (define_int_iterator FRECP [UNSPEC_FRECPE UNSPEC_FRECPX])
+
+ (define_int_iterator CRC [UNSPEC_CRC32B UNSPEC_CRC32H UNSPEC_CRC32W
+@@ -1036,7 +1101,9 @@
+ (UNSPEC_FMAXV "smax_nan")
+ (UNSPEC_FMIN "smin_nan")
+ (UNSPEC_FMINNMV "smin")
+- (UNSPEC_FMINV "smin_nan")])
++ (UNSPEC_FMINV "smin_nan")
++ (UNSPEC_FMAXNM "fmax")
++ (UNSPEC_FMINNM "fmin")])
+
+ (define_int_attr maxmin_uns_op [(UNSPEC_UMAXV "umax")
+ (UNSPEC_UMINV "umin")
+@@ -1047,13 +1114,9 @@
+ (UNSPEC_FMAXV "fmax")
+ (UNSPEC_FMIN "fmin")
+ (UNSPEC_FMINNMV "fminnm")
+- (UNSPEC_FMINV "fmin")])
+-
+-(define_int_attr fmaxmin [(UNSPEC_FMAXNM "fmax")
+- (UNSPEC_FMINNM "fmin")])
+-
+-(define_int_attr fmaxmin_op [(UNSPEC_FMAXNM "fmaxnm")
+- (UNSPEC_FMINNM "fminnm")])
++ (UNSPEC_FMINV "fmin")
++ (UNSPEC_FMAXNM "fmaxnm")
++ (UNSPEC_FMINNM "fminnm")])
+
+ (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
+ (UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur")
+@@ -1137,6 +1200,11 @@
+ (UNSPEC_FRINTP "ceil") (UNSPEC_FRINTM "floor")
+ (UNSPEC_FRINTN "frintn")])
+
++(define_int_attr fcvt_fixed_insn [(UNSPEC_SCVTF "scvtf")
++ (UNSPEC_UCVTF "ucvtf")
++ (UNSPEC_FCVTZS "fcvtzs")
++ (UNSPEC_FCVTZU "fcvtzu")])
++
+ (define_int_attr perm_insn [(UNSPEC_ZIP1 "zip") (UNSPEC_ZIP2 "zip")
+ (UNSPEC_TRN1 "trn") (UNSPEC_TRN2 "trn")
+ (UNSPEC_UZP1 "uzp") (UNSPEC_UZP2 "uzp")])
+--- a/src/gcc/config/arm/aarch-cost-tables.h
++++ b/src/gcc/config/arm/aarch-cost-tables.h
+@@ -191,35 +191,35 @@ const struct cpu_cost_table cortexa53_extra_costs =
+ {
+ /* FP SFmode */
+ {
+- COSTS_N_INSNS (15), /* div. */
+- COSTS_N_INSNS (3), /* mult. */
+- COSTS_N_INSNS (7), /* mult_addsub. */
+- COSTS_N_INSNS (7), /* fma. */
+- COSTS_N_INSNS (3), /* addsub. */
+- COSTS_N_INSNS (1), /* fpconst. */
+- COSTS_N_INSNS (2), /* neg. */
+- COSTS_N_INSNS (1), /* compare. */
+- COSTS_N_INSNS (3), /* widen. */
+- COSTS_N_INSNS (3), /* narrow. */
+- COSTS_N_INSNS (3), /* toint. */
+- COSTS_N_INSNS (3), /* fromint. */
+- COSTS_N_INSNS (3) /* roundint. */
++ COSTS_N_INSNS (5), /* div. */
++ COSTS_N_INSNS (1), /* mult. */
++ COSTS_N_INSNS (2), /* mult_addsub. */
++ COSTS_N_INSNS (2), /* fma. */
++ COSTS_N_INSNS (1), /* addsub. */
++ 0, /* fpconst. */
++ COSTS_N_INSNS (1), /* neg. */
++ 0, /* compare. */
++ COSTS_N_INSNS (1), /* widen. */
++ COSTS_N_INSNS (1), /* narrow. */
++ COSTS_N_INSNS (1), /* toint. */
++ COSTS_N_INSNS (1), /* fromint. */
++ COSTS_N_INSNS (1) /* roundint. */
+ },
+ /* FP DFmode */
+ {
+- COSTS_N_INSNS (30), /* div. */
+- COSTS_N_INSNS (3), /* mult. */
+- COSTS_N_INSNS (7), /* mult_addsub. */
+- COSTS_N_INSNS (7), /* fma. */
+- COSTS_N_INSNS (3), /* addsub. */
+- COSTS_N_INSNS (1), /* fpconst. */
+- COSTS_N_INSNS (2), /* neg. */
+- COSTS_N_INSNS (1), /* compare. */
+- COSTS_N_INSNS (3), /* widen. */
+- COSTS_N_INSNS (3), /* narrow. */
+- COSTS_N_INSNS (3), /* toint. */
+- COSTS_N_INSNS (3), /* fromint. */
+- COSTS_N_INSNS (3) /* roundint. */
++ COSTS_N_INSNS (10), /* div. */
++ COSTS_N_INSNS (1), /* mult. */
++ COSTS_N_INSNS (2), /* mult_addsub. */
++ COSTS_N_INSNS (2), /* fma. */
++ COSTS_N_INSNS (1), /* addsub. */
++ 0, /* fpconst. */
++ COSTS_N_INSNS (1), /* neg. */
++ 0, /* compare. */
++ COSTS_N_INSNS (1), /* widen. */
++ COSTS_N_INSNS (1), /* narrow. */
++ COSTS_N_INSNS (1), /* toint. */
++ COSTS_N_INSNS (1), /* fromint. */
++ COSTS_N_INSNS (1) /* roundint. */
+ }
+ },
+ /* Vector */
+@@ -294,35 +294,35 @@ const struct cpu_cost_table cortexa57_extra_costs =
+ {
+ /* FP SFmode */
+ {
+- COSTS_N_INSNS (17), /* div. */
+- COSTS_N_INSNS (5), /* mult. */
+- COSTS_N_INSNS (9), /* mult_addsub. */
+- COSTS_N_INSNS (9), /* fma. */
+- COSTS_N_INSNS (4), /* addsub. */
+- COSTS_N_INSNS (2), /* fpconst. */
+- COSTS_N_INSNS (2), /* neg. */
+- COSTS_N_INSNS (2), /* compare. */
+- COSTS_N_INSNS (4), /* widen. */
+- COSTS_N_INSNS (4), /* narrow. */
+- COSTS_N_INSNS (4), /* toint. */
+- COSTS_N_INSNS (4), /* fromint. */
+- COSTS_N_INSNS (4) /* roundint. */
++ COSTS_N_INSNS (6), /* div. */
++ COSTS_N_INSNS (1), /* mult. */
++ COSTS_N_INSNS (2), /* mult_addsub. */
++ COSTS_N_INSNS (2), /* fma. */
++ COSTS_N_INSNS (1), /* addsub. */
++ 0, /* fpconst. */
++ 0, /* neg. */
++ 0, /* compare. */
++ COSTS_N_INSNS (1), /* widen. */
++ COSTS_N_INSNS (1), /* narrow. */
++ COSTS_N_INSNS (1), /* toint. */
++ COSTS_N_INSNS (1), /* fromint. */
++ COSTS_N_INSNS (1) /* roundint. */
+ },
+ /* FP DFmode */
+ {
+- COSTS_N_INSNS (31), /* div. */
+- COSTS_N_INSNS (5), /* mult. */
+- COSTS_N_INSNS (9), /* mult_addsub. */
+- COSTS_N_INSNS (9), /* fma. */
+- COSTS_N_INSNS (4), /* addsub. */
+- COSTS_N_INSNS (2), /* fpconst. */
+- COSTS_N_INSNS (2), /* neg. */
+- COSTS_N_INSNS (2), /* compare. */
+- COSTS_N_INSNS (4), /* widen. */
+- COSTS_N_INSNS (4), /* narrow. */
+- COSTS_N_INSNS (4), /* toint. */
+- COSTS_N_INSNS (4), /* fromint. */
+- COSTS_N_INSNS (4) /* roundint. */
++ COSTS_N_INSNS (11), /* div. */
++ COSTS_N_INSNS (1), /* mult. */
++ COSTS_N_INSNS (2), /* mult_addsub. */
++ COSTS_N_INSNS (2), /* fma. */
++ COSTS_N_INSNS (1), /* addsub. */
++ 0, /* fpconst. */
++ 0, /* neg. */
++ 0, /* compare. */
++ COSTS_N_INSNS (1), /* widen. */
++ COSTS_N_INSNS (1), /* narrow. */
++ COSTS_N_INSNS (1), /* toint. */
++ COSTS_N_INSNS (1), /* fromint. */
++ COSTS_N_INSNS (1) /* roundint. */
+ }
+ },
+ /* Vector */
+@@ -537,4 +537,107 @@ const struct cpu_cost_table xgene1_extra_costs =
+ }
+ };
+
++const struct cpu_cost_table qdf24xx_extra_costs =
++{
++ /* ALU */
++ {
++ 0, /* arith. */
++ 0, /* logical. */
++ 0, /* shift. */
++ 0, /* shift_reg. */
++ COSTS_N_INSNS (1), /* arith_shift. */
++ COSTS_N_INSNS (1), /* arith_shift_reg. */
++ 0, /* log_shift. */
++ 0, /* log_shift_reg. */
++ 0, /* extend. */
++ 0, /* extend_arith. */
++ 0, /* bfi. */
++ 0, /* bfx. */
++ 0, /* clz. */
++ 0, /* rev. */
++ 0, /* non_exec. */
++ true /* non_exec_costs_exec. */
++ },
++ {
++ /* MULT SImode */
++ {
++ COSTS_N_INSNS (2), /* simple. */
++ COSTS_N_INSNS (2), /* flag_setting. */
++ COSTS_N_INSNS (2), /* extend. */
++ COSTS_N_INSNS (2), /* add. */
++ COSTS_N_INSNS (2), /* extend_add. */
++ COSTS_N_INSNS (4) /* idiv. */
++ },
++ /* MULT DImode */
++ {
++ COSTS_N_INSNS (3), /* simple. */
++ 0, /* flag_setting (N/A). */
++ COSTS_N_INSNS (3), /* extend. */
++ COSTS_N_INSNS (3), /* add. */
++ COSTS_N_INSNS (3), /* extend_add. */
++ COSTS_N_INSNS (9) /* idiv. */
++ }
++ },
++ /* LD/ST */
++ {
++ COSTS_N_INSNS (2), /* load. */
++ COSTS_N_INSNS (2), /* load_sign_extend. */
++ COSTS_N_INSNS (2), /* ldrd. */
++ COSTS_N_INSNS (2), /* ldm_1st. */
++ 1, /* ldm_regs_per_insn_1st. */
++ 2, /* ldm_regs_per_insn_subsequent. */
++ COSTS_N_INSNS (2), /* loadf. */
++ COSTS_N_INSNS (2), /* loadd. */
++ COSTS_N_INSNS (3), /* load_unaligned. */
++ 0, /* store. */
++ 0, /* strd. */
++ 0, /* stm_1st. */
++ 1, /* stm_regs_per_insn_1st. */
++ 2, /* stm_regs_per_insn_subsequent. */
++ 0, /* storef. */
++ 0, /* stored. */
++ COSTS_N_INSNS (1), /* store_unaligned. */
++ COSTS_N_INSNS (1), /* loadv. */
++ COSTS_N_INSNS (1) /* storev. */
++ },
++ {
++ /* FP SFmode */
++ {
++ COSTS_N_INSNS (6), /* div. */
++ COSTS_N_INSNS (5), /* mult. */
++ COSTS_N_INSNS (5), /* mult_addsub. */
++ COSTS_N_INSNS (5), /* fma. */
++ COSTS_N_INSNS (3), /* addsub. */
++ COSTS_N_INSNS (1), /* fpconst. */
++ COSTS_N_INSNS (1), /* neg. */
++ COSTS_N_INSNS (2), /* compare. */
++ COSTS_N_INSNS (4), /* widen. */
++ COSTS_N_INSNS (4), /* narrow. */
++ COSTS_N_INSNS (4), /* toint. */
++ COSTS_N_INSNS (4), /* fromint. */
++ COSTS_N_INSNS (2) /* roundint. */
++ },
++ /* FP DFmode */
++ {
++ COSTS_N_INSNS (11), /* div. */
++ COSTS_N_INSNS (6), /* mult. */
++ COSTS_N_INSNS (6), /* mult_addsub. */
++ COSTS_N_INSNS (6), /* fma. */
++ COSTS_N_INSNS (3), /* addsub. */
++ COSTS_N_INSNS (1), /* fpconst. */
++ COSTS_N_INSNS (1), /* neg. */
++ COSTS_N_INSNS (2), /* compare. */
++ COSTS_N_INSNS (4), /* widen. */
++ COSTS_N_INSNS (4), /* narrow. */
++ COSTS_N_INSNS (4), /* toint. */
++ COSTS_N_INSNS (4), /* fromint. */
++ COSTS_N_INSNS (2) /* roundint. */
++ }
++ },
++ /* Vector */
++ {
++ COSTS_N_INSNS (1) /* alu. */
++ }
++};
++
+ #endif /* GCC_AARCH_COST_TABLES_H */
+--- a/src/gcc/config/arm/arm-arches.def
++++ b/src/gcc/config/arm/arm-arches.def
+@@ -58,10 +58,22 @@ ARM_ARCH("armv7e-m", cortexm4, 7EM, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_F
+ ARM_ARCH("armv8-a", cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
+ ARM_ARCH("armv8-a+crc",cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A))
+ ARM_ARCH("armv8.1-a", cortexa53, 8A,
+- ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A, FL2_FOR_ARCH8_1A))
++ ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
++ FL2_FOR_ARCH8_1A))
+ ARM_ARCH("armv8.1-a+crc",cortexa53, 8A,
+ ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+ FL2_FOR_ARCH8_1A))
++ARM_ARCH ("armv8.2-a", cortexa53, 8A,
++ ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
++ FL2_FOR_ARCH8_2A))
++ARM_ARCH ("armv8.2-a+fp16", cortexa53, 8A,
++ ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
++ FL2_FOR_ARCH8_2A | FL2_FP16INST))
++ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE,
++ ARM_FSET_MAKE_CPU1 ( FL_FOR_ARCH8M_BASE))
++ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN,
++ ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_FOR_ARCH8M_MAIN))
++ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN,
++ ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN))
+ ARM_ARCH("iwmmxt", iwmmxt, 5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
+ ARM_ARCH("iwmmxt2", iwmmxt2, 5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
+-
+--- a/src/gcc/config/arm/arm-builtins.c
++++ b/src/gcc/config/arm/arm-builtins.c
+@@ -190,6 +190,8 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ #define ti_UP TImode
+ #define ei_UP EImode
+ #define oi_UP OImode
++#define hf_UP HFmode
++#define si_UP SImode
+
+ #define UP(X) X##_UP
+
+@@ -239,12 +241,22 @@ typedef struct {
+ VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
+ VAR1 (T, N, L)
+
+-/* The NEON builtin data can be found in arm_neon_builtins.def.
+- The mode entries in the following table correspond to the "key" type of the
+- instruction variant, i.e. equivalent to that which would be specified after
+- the assembler mnemonic, which usually refers to the last vector operand.
+- The modes listed per instruction should be the same as those defined for
+- that instruction's pattern in neon.md. */
++/* The NEON builtin data can be found in arm_neon_builtins.def and
++ arm_vfp_builtins.def. The entries in arm_neon_builtins.def require
++ TARGET_NEON to be true. The entries in arm_vfp_builtins.def require
++ TARGET_VFP to be true. The feature tests are checked when the builtins are
++ expanded.
++
++ The mode entries in the following table correspond to
++ the "key" type of the instruction variant, i.e. equivalent to that which
++ would be specified after the assembler mnemonic, which usually refers to the
++ last vector operand. The modes listed per instruction should be the same as
++ those defined for that instruction's pattern in neon.md. */
++
++static neon_builtin_datum vfp_builtin_data[] =
++{
++#include "arm_vfp_builtins.def"
++};
+
+ static neon_builtin_datum neon_builtin_data[] =
+ {
+@@ -534,6 +546,10 @@ enum arm_builtins
+ #undef CRYPTO2
+ #undef CRYPTO3
+
++ ARM_BUILTIN_VFP_BASE,
++
++#include "arm_vfp_builtins.def"
++
+ ARM_BUILTIN_NEON_BASE,
+ ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
+
+@@ -542,8 +558,11 @@ enum arm_builtins
+ ARM_BUILTIN_MAX
+ };
+
++#define ARM_BUILTIN_VFP_PATTERN_START \
++ (ARM_BUILTIN_VFP_BASE + 1)
++
+ #define ARM_BUILTIN_NEON_PATTERN_START \
+- (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
++ (ARM_BUILTIN_NEON_BASE + 1)
+
+ #undef CF
+ #undef VAR1
+@@ -895,6 +914,110 @@ arm_init_simd_builtin_scalar_types (void)
+ "__builtin_neon_uti");
+ }
+
++/* Set up a NEON builtin. */
++
++static void
++arm_init_neon_builtin (unsigned int fcode,
++ neon_builtin_datum *d)
++{
++ bool print_type_signature_p = false;
++ char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
++ char namebuf[60];
++ tree ftype = NULL;
++ tree fndecl = NULL;
++
++ d->fcode = fcode;
++
++ /* We must track two variables here. op_num is
++ the operand number as in the RTL pattern. This is
++ required to access the mode (e.g. V4SF mode) of the
++ argument, from which the base type can be derived.
++ arg_num is an index in to the qualifiers data, which
++ gives qualifiers to the type (e.g. const unsigned).
++ The reason these two variables may differ by one is the
++ void return type. While all return types take the 0th entry
++ in the qualifiers array, there is no operand for them in the
++ RTL pattern. */
++ int op_num = insn_data[d->code].n_operands - 1;
++ int arg_num = d->qualifiers[0] & qualifier_void
++ ? op_num + 1
++ : op_num;
++ tree return_type = void_type_node, args = void_list_node;
++ tree eltype;
++
++ /* Build a function type directly from the insn_data for this
++ builtin. The build_function_type () function takes care of
++ removing duplicates for us. */
++ for (; op_num >= 0; arg_num--, op_num--)
++ {
++ machine_mode op_mode = insn_data[d->code].operand[op_num].mode;
++ enum arm_type_qualifiers qualifiers = d->qualifiers[arg_num];
++
++ if (qualifiers & qualifier_unsigned)
++ {
++ type_signature[arg_num] = 'u';
++ print_type_signature_p = true;
++ }
++ else if (qualifiers & qualifier_poly)
++ {
++ type_signature[arg_num] = 'p';
++ print_type_signature_p = true;
++ }
++ else
++ type_signature[arg_num] = 's';
++
++ /* Skip an internal operand for vget_{low, high}. */
++ if (qualifiers & qualifier_internal)
++ continue;
++
++ /* Some builtins have different user-facing types
++ for certain arguments, encoded in d->mode. */
++ if (qualifiers & qualifier_map_mode)
++ op_mode = d->mode;
++
++ /* For pointers, we want a pointer to the basic type
++ of the vector. */
++ if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
++ op_mode = GET_MODE_INNER (op_mode);
++
++ eltype = arm_simd_builtin_type
++ (op_mode,
++ (qualifiers & qualifier_unsigned) != 0,
++ (qualifiers & qualifier_poly) != 0);
++ gcc_assert (eltype != NULL);
++
++ /* Add qualifiers. */
++ if (qualifiers & qualifier_const)
++ eltype = build_qualified_type (eltype, TYPE_QUAL_CONST);
++
++ if (qualifiers & qualifier_pointer)
++ eltype = build_pointer_type (eltype);
++
++ /* If we have reached arg_num == 0, we are at a non-void
++ return type. Otherwise, we are still processing
++ arguments. */
++ if (arg_num == 0)
++ return_type = eltype;
++ else
++ args = tree_cons (NULL_TREE, eltype, args);
++ }
++
++ ftype = build_function_type (return_type, args);
++
++ gcc_assert (ftype != NULL);
++
++ if (print_type_signature_p)
++ snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s_%s",
++ d->name, type_signature);
++ else
++ snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s",
++ d->name);
++
++ fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
++ NULL, NULL_TREE);
++ arm_builtin_decls[fcode] = fndecl;
++}
++
+ /* Set up all the NEON builtins, even builtins for instructions that are not
+ in the current target ISA to allow the user to compile particular modules
+ with different target specific options that differ from the command line
+@@ -924,103 +1047,22 @@ arm_init_neon_builtins (void)
+
+ for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)
+ {
+- bool print_type_signature_p = false;
+- char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
+ neon_builtin_datum *d = &neon_builtin_data[i];
+- char namebuf[60];
+- tree ftype = NULL;
+- tree fndecl = NULL;
+-
+- d->fcode = fcode;
+-
+- /* We must track two variables here. op_num is
+- the operand number as in the RTL pattern. This is
+- required to access the mode (e.g. V4SF mode) of the
+- argument, from which the base type can be derived.
+- arg_num is an index in to the qualifiers data, which
+- gives qualifiers to the type (e.g. const unsigned).
+- The reason these two variables may differ by one is the
+- void return type. While all return types take the 0th entry
+- in the qualifiers array, there is no operand for them in the
+- RTL pattern. */
+- int op_num = insn_data[d->code].n_operands - 1;
+- int arg_num = d->qualifiers[0] & qualifier_void
+- ? op_num + 1
+- : op_num;
+- tree return_type = void_type_node, args = void_list_node;
+- tree eltype;
+-
+- /* Build a function type directly from the insn_data for this
+- builtin. The build_function_type () function takes care of
+- removing duplicates for us. */
+- for (; op_num >= 0; arg_num--, op_num--)
+- {
+- machine_mode op_mode = insn_data[d->code].operand[op_num].mode;
+- enum arm_type_qualifiers qualifiers = d->qualifiers[arg_num];
+-
+- if (qualifiers & qualifier_unsigned)
+- {
+- type_signature[arg_num] = 'u';
+- print_type_signature_p = true;
+- }
+- else if (qualifiers & qualifier_poly)
+- {
+- type_signature[arg_num] = 'p';
+- print_type_signature_p = true;
+- }
+- else
+- type_signature[arg_num] = 's';
+-
+- /* Skip an internal operand for vget_{low, high}. */
+- if (qualifiers & qualifier_internal)
+- continue;
+-
+- /* Some builtins have different user-facing types
+- for certain arguments, encoded in d->mode. */
+- if (qualifiers & qualifier_map_mode)
+- op_mode = d->mode;
+-
+- /* For pointers, we want a pointer to the basic type
+- of the vector. */
+- if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
+- op_mode = GET_MODE_INNER (op_mode);
+-
+- eltype = arm_simd_builtin_type
+- (op_mode,
+- (qualifiers & qualifier_unsigned) != 0,
+- (qualifiers & qualifier_poly) != 0);
+- gcc_assert (eltype != NULL);
+-
+- /* Add qualifiers. */
+- if (qualifiers & qualifier_const)
+- eltype = build_qualified_type (eltype, TYPE_QUAL_CONST);
+-
+- if (qualifiers & qualifier_pointer)
+- eltype = build_pointer_type (eltype);
+-
+- /* If we have reached arg_num == 0, we are at a non-void
+- return type. Otherwise, we are still processing
+- arguments. */
+- if (arg_num == 0)
+- return_type = eltype;
+- else
+- args = tree_cons (NULL_TREE, eltype, args);
+- }
+-
+- ftype = build_function_type (return_type, args);
++ arm_init_neon_builtin (fcode, d);
++ }
++}
+
+- gcc_assert (ftype != NULL);
++/* Set up all the scalar floating point builtins. */
+
+- if (print_type_signature_p)
+- snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s_%s",
+- d->name, type_signature);
+- else
+- snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s",
+- d->name);
++static void
++arm_init_vfp_builtins (void)
++{
++ unsigned int i, fcode = ARM_BUILTIN_VFP_PATTERN_START;
+
+- fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD,
+- NULL, NULL_TREE);
+- arm_builtin_decls[fcode] = fndecl;
++ for (i = 0; i < ARRAY_SIZE (vfp_builtin_data); i++, fcode++)
++ {
++ neon_builtin_datum *d = &vfp_builtin_data[i];
++ arm_init_neon_builtin (fcode, d);
+ }
+ }
+
+@@ -1768,7 +1810,7 @@ arm_init_builtins (void)
+ if (TARGET_HARD_FLOAT)
+ {
+ arm_init_neon_builtins ();
+-
++ arm_init_vfp_builtins ();
+ arm_init_crypto_builtins ();
+ }
+
+@@ -2211,40 +2253,16 @@ constant_arg:
+ return target;
+ }
+
+-/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
+- Most of these are "special" because they don't have symbolic
+- constants defined per-instruction or per instruction-variant. Instead, the
+- required info is looked up in the table neon_builtin_data. */
++/* Expand a neon builtin. This is also used for vfp builtins, which behave in
++ the same way. These builtins are "special" because they don't have symbolic
++ constants defined per-instruction or per instruction-variant. Instead, the
++ required info is looked up in the NEON_BUILTIN_DATA record that is passed
++ into the function. */
++
+ static rtx
+-arm_expand_neon_builtin (int fcode, tree exp, rtx target)
++arm_expand_neon_builtin_1 (int fcode, tree exp, rtx target,
++ neon_builtin_datum *d)
+ {
+- /* Check in the context of the function making the call whether the
+- builtin is supported. */
+- if (! TARGET_NEON)
+- {
+- fatal_error (input_location,
+- "You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use these intrinsics.");
+- return const0_rtx;
+- }
+-
+- if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
+- {
+- /* Builtin is only to check bounds of the lane passed to some intrinsics
+- that are implemented with gcc vector extensions in arm_neon.h. */
+-
+- tree nlanes = CALL_EXPR_ARG (exp, 0);
+- gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
+- rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
+- if (CONST_INT_P (lane_idx))
+- neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
+- else
+- error ("%Klane index must be a constant immediate", exp);
+- /* Don't generate any RTL. */
+- return const0_rtx;
+- }
+-
+- neon_builtin_datum *d =
+- &neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
+ enum insn_code icode = d->code;
+ builtin_arg args[SIMD_MAX_BUILTIN_ARGS + 1];
+ int num_args = insn_data[d->code].n_operands;
+@@ -2260,8 +2278,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
+ /* We have four arrays of data, each indexed in a different fashion.
+ qualifiers - element 0 always describes the function return type.
+ operands - element 0 is either the operand for return value (if
+- the function has a non-void return type) or the operand for the
+- first argument.
++ the function has a non-void return type) or the operand for the
++ first argument.
+ expr_args - element 0 always holds the first argument.
+ args - element 0 is always used for the return type. */
+ int qualifiers_k = k;
+@@ -2283,7 +2301,7 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
+ bool op_const_int_p =
+ (CONST_INT_P (arg)
+ && (*insn_data[icode].operand[operands_k].predicate)
+- (arg, insn_data[icode].operand[operands_k].mode));
++ (arg, insn_data[icode].operand[operands_k].mode));
+ args[k] = op_const_int_p ? NEON_ARG_CONSTANT : NEON_ARG_COPY_TO_REG;
+ }
+ else if (d->qualifiers[qualifiers_k] & qualifier_pointer)
+@@ -2296,8 +2314,68 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
+ /* The interface to arm_expand_neon_args expects a 0 if
+ the function is void, and a 1 if it is not. */
+ return arm_expand_neon_args
+- (target, d->mode, fcode, icode, !is_void, exp,
+- &args[1]);
++ (target, d->mode, fcode, icode, !is_void, exp,
++ &args[1]);
++}
++
++/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
++ Most of these are "special" because they don't have symbolic
++ constants defined per-instruction or per instruction-variant. Instead, the
++ required info is looked up in the table neon_builtin_data. */
++
++static rtx
++arm_expand_neon_builtin (int fcode, tree exp, rtx target)
++{
++ if (fcode >= ARM_BUILTIN_NEON_BASE && ! TARGET_NEON)
++ {
++ fatal_error (input_location,
++ "You must enable NEON instructions"
++ " (e.g. -mfloat-abi=softfp -mfpu=neon)"
++ " to use these intrinsics.");
++ return const0_rtx;
++ }
++
++ if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
++ {
++ /* Builtin is only to check bounds of the lane passed to some intrinsics
++ that are implemented with gcc vector extensions in arm_neon.h. */
++
++ tree nlanes = CALL_EXPR_ARG (exp, 0);
++ gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
++ rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
++ if (CONST_INT_P (lane_idx))
++ neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
++ else
++ error ("%Klane index must be a constant immediate", exp);
++ /* Don't generate any RTL. */
++ return const0_rtx;
++ }
++
++ neon_builtin_datum *d
++ = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
++
++ return arm_expand_neon_builtin_1 (fcode, exp, target, d);
++}
++
++/* Expand a VFP builtin, if TARGET_VFP is true. These builtins are treated like
++ neon builtins except that the data is looked up in table
++ VFP_BUILTIN_DATA. */
++
++static rtx
++arm_expand_vfp_builtin (int fcode, tree exp, rtx target)
++{
++ if (fcode >= ARM_BUILTIN_VFP_BASE && ! TARGET_VFP)
++ {
++ fatal_error (input_location,
++ "You must enable VFP instructions"
++ " to use these intrinsics.");
++ return const0_rtx;
++ }
++
++ neon_builtin_datum *d
++ = &vfp_builtin_data[fcode - ARM_BUILTIN_VFP_PATTERN_START];
++
++ return arm_expand_neon_builtin_1 (fcode, exp, target, d);
+ }
+
+ /* Expand an expression EXP that calls a built-in function,
+@@ -2337,13 +2415,18 @@ arm_expand_builtin (tree exp,
+ if (fcode >= ARM_BUILTIN_NEON_BASE)
+ return arm_expand_neon_builtin (fcode, exp, target);
+
++ if (fcode >= ARM_BUILTIN_VFP_BASE)
++ return arm_expand_vfp_builtin (fcode, exp, target);
++
+ /* Check in the context of the function making the call whether the
+ builtin is supported. */
+ if (fcode >= ARM_BUILTIN_CRYPTO_BASE
+ && (!TARGET_CRYPTO || !TARGET_HARD_FLOAT))
+ {
+ fatal_error (input_location,
+- "You must enable crypto intrinsics (e.g. include -mfloat-abi=softfp -mfpu=crypto-neon...) to use these intrinsics.");
++ "You must enable crypto instructions"
++ " (e.g. include -mfloat-abi=softfp -mfpu=crypto-neon...)"
++ " to use these intrinsics.");
+ return const0_rtx;
+ }
+
+--- a/src/gcc/config/arm/arm-c.c
++++ b/src/gcc/config/arm/arm-c.c
+@@ -135,10 +135,17 @@ arm_cpu_builtins (struct cpp_reader* pfile)
+ else
+ cpp_undef (pfile, "__ARM_FP");
+
+- if (arm_fp16_format == ARM_FP16_FORMAT_IEEE)
+- builtin_define ("__ARM_FP16_FORMAT_IEEE");
+- if (arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
+- builtin_define ("__ARM_FP16_FORMAT_ALTERNATIVE");
++ def_or_undef_macro (pfile, "__ARM_FP16_FORMAT_IEEE",
++ arm_fp16_format == ARM_FP16_FORMAT_IEEE);
++ def_or_undef_macro (pfile, "__ARM_FP16_FORMAT_ALTERNATIVE",
++ arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE);
++ def_or_undef_macro (pfile, "__ARM_FP16_ARGS",
++ arm_fp16_format != ARM_FP16_FORMAT_NONE);
++
++ def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC",
++ TARGET_VFP_FP16INST);
++ def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC",
++ TARGET_NEON_FP16INST);
+
+ def_or_undef_macro (pfile, "__ARM_FEATURE_FMA", TARGET_FMA);
+ def_or_undef_macro (pfile, "__ARM_NEON__", TARGET_NEON);
+--- a/src/gcc/config/arm/arm-cores.def
++++ b/src/gcc/config/arm/arm-cores.def
+@@ -171,10 +171,14 @@ ARM_CORE("cortex-a35", cortexa35, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED
+ ARM_CORE("cortex-a53", cortexa53, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a53)
+ ARM_CORE("cortex-a57", cortexa57, cortexa57, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
+ ARM_CORE("cortex-a72", cortexa72, cortexa57, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
++ARM_CORE("cortex-a73", cortexa73, cortexa57, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
+ ARM_CORE("exynos-m1", exynosm1, exynosm1, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), exynosm1)
+-ARM_CORE("qdf24xx", qdf24xx, cortexa57, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
++ARM_CORE("qdf24xx", qdf24xx, cortexa57, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
+ ARM_CORE("xgene1", xgene1, xgene1, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8A), xgene1)
+
+ /* V8 big.LITTLE implementations */
+ ARM_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
+ ARM_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
++ARM_CORE("cortex-a73.cortex-a35", cortexa73cortexa35, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
++ARM_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
++
+--- a/src/gcc/config/arm/arm-modes.def
++++ b/src/gcc/config/arm/arm-modes.def
+@@ -59,6 +59,7 @@ CC_MODE (CC_DGEU);
+ CC_MODE (CC_DGTU);
+ CC_MODE (CC_C);
+ CC_MODE (CC_N);
++CC_MODE (CC_V);
+
+ /* Vector modes. */
+ VECTOR_MODES (INT, 4); /* V4QI V2HI */
+--- a/src/gcc/config/arm/arm-protos.h
++++ b/src/gcc/config/arm/arm-protos.h
+@@ -50,8 +50,12 @@ extern tree arm_builtin_decl (unsigned code, bool initialize_p
+ ATTRIBUTE_UNUSED);
+ extern void arm_init_builtins (void);
+ extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update);
+-
++extern rtx arm_simd_vect_par_cnst_half (machine_mode mode, bool high);
++extern bool arm_simd_check_vect_par_cnst_half_p (rtx op, machine_mode mode,
++ bool high);
+ #ifdef RTX_CODE
++extern void arm_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
++ rtx label_ref);
+ extern bool arm_vector_mode_supported_p (machine_mode);
+ extern bool arm_small_register_classes_for_mode_p (machine_mode);
+ extern int arm_hard_regno_mode_ok (unsigned int, machine_mode);
+@@ -161,6 +165,7 @@ extern const char *arm_output_iwmmxt_shift_immediate (const char *, rtx *, bool)
+ extern const char *arm_output_iwmmxt_tinsr (rtx *);
+ extern unsigned int arm_sync_loop_insns (rtx , rtx *);
+ extern int arm_attr_length_push_multi(rtx, rtx);
++extern int arm_attr_length_pop_multi(rtx *, bool, bool);
+ extern void arm_expand_compare_and_swap (rtx op[]);
+ extern void arm_split_compare_and_swap (rtx op[]);
+ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
+@@ -192,7 +197,6 @@ extern const char *thumb_call_via_reg (rtx);
+ extern void thumb_expand_movmemqi (rtx *);
+ extern rtx arm_return_addr (int, rtx);
+ extern void thumb_reload_out_hi (rtx *);
+-extern void thumb_reload_in_hi (rtx *);
+ extern void thumb_set_return_address (rtx, rtx);
+ extern const char *thumb1_output_casesi (rtx *);
+ extern const char *thumb2_output_casesi (rtx *);
+@@ -319,6 +323,7 @@ extern int vfp3_const_double_for_bits (rtx);
+
+ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
+ rtx);
++extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
+ extern bool arm_valid_symbolic_address_p (rtx);
+ extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
+ #endif /* RTX_CODE */
+@@ -388,36 +393,43 @@ extern bool arm_is_constant_pool_ref (rtx);
+ #define FL_ARCH6KZ (1 << 31) /* ARMv6KZ architecture. */
+
+ #define FL2_ARCH8_1 (1 << 0) /* Architecture 8.1. */
++#define FL2_ARCH8_2 (1 << 1) /* Architecture 8.2. */
++#define FL2_FP16INST (1 << 2) /* FP16 Instructions for ARMv8.2 and
++ later. */
+
+ /* Flags that only effect tuning, not available instructions. */
+ #define FL_TUNE (FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
+ | FL_CO_PROC)
+
+-#define FL_FOR_ARCH2 FL_NOTM
+-#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32)
+-#define FL_FOR_ARCH3M (FL_FOR_ARCH3 | FL_ARCH3M)
+-#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4)
+-#define FL_FOR_ARCH4T (FL_FOR_ARCH4 | FL_THUMB)
+-#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5)
+-#define FL_FOR_ARCH5T (FL_FOR_ARCH5 | FL_THUMB)
+-#define FL_FOR_ARCH5E (FL_FOR_ARCH5 | FL_ARCH5E)
+-#define FL_FOR_ARCH5TE (FL_FOR_ARCH5E | FL_THUMB)
+-#define FL_FOR_ARCH5TEJ FL_FOR_ARCH5TE
+-#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6)
+-#define FL_FOR_ARCH6J FL_FOR_ARCH6
+-#define FL_FOR_ARCH6K (FL_FOR_ARCH6 | FL_ARCH6K)
+-#define FL_FOR_ARCH6Z FL_FOR_ARCH6
+-#define FL_FOR_ARCH6KZ (FL_FOR_ARCH6K | FL_ARCH6KZ)
+-#define FL_FOR_ARCH6T2 (FL_FOR_ARCH6 | FL_THUMB2)
+-#define FL_FOR_ARCH6M (FL_FOR_ARCH6 & ~FL_NOTM)
+-#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) | FL_ARCH7)
+-#define FL_FOR_ARCH7A (FL_FOR_ARCH7 | FL_NOTM | FL_ARCH6K)
+-#define FL_FOR_ARCH7VE (FL_FOR_ARCH7A | FL_THUMB_DIV | FL_ARM_DIV)
+-#define FL_FOR_ARCH7R (FL_FOR_ARCH7A | FL_THUMB_DIV)
+-#define FL_FOR_ARCH7M (FL_FOR_ARCH7 | FL_THUMB_DIV)
+-#define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM)
+-#define FL_FOR_ARCH8A (FL_FOR_ARCH7VE | FL_ARCH8)
++#define FL_FOR_ARCH2 FL_NOTM
++#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32)
++#define FL_FOR_ARCH3M (FL_FOR_ARCH3 | FL_ARCH3M)
++#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4)
++#define FL_FOR_ARCH4T (FL_FOR_ARCH4 | FL_THUMB)
++#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5)
++#define FL_FOR_ARCH5T (FL_FOR_ARCH5 | FL_THUMB)
++#define FL_FOR_ARCH5E (FL_FOR_ARCH5 | FL_ARCH5E)
++#define FL_FOR_ARCH5TE (FL_FOR_ARCH5E | FL_THUMB)
++#define FL_FOR_ARCH5TEJ FL_FOR_ARCH5TE
++#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6)
++#define FL_FOR_ARCH6J FL_FOR_ARCH6
++#define FL_FOR_ARCH6K (FL_FOR_ARCH6 | FL_ARCH6K)
++#define FL_FOR_ARCH6Z FL_FOR_ARCH6
++#define FL_FOR_ARCH6ZK FL_FOR_ARCH6K
++#define FL_FOR_ARCH6KZ (FL_FOR_ARCH6K | FL_ARCH6KZ)
++#define FL_FOR_ARCH6T2 (FL_FOR_ARCH6 | FL_THUMB2)
++#define FL_FOR_ARCH6M (FL_FOR_ARCH6 & ~FL_NOTM)
++#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) | FL_ARCH7)
++#define FL_FOR_ARCH7A (FL_FOR_ARCH7 | FL_NOTM | FL_ARCH6K)
++#define FL_FOR_ARCH7VE (FL_FOR_ARCH7A | FL_THUMB_DIV | FL_ARM_DIV)
++#define FL_FOR_ARCH7R (FL_FOR_ARCH7A | FL_THUMB_DIV)
++#define FL_FOR_ARCH7M (FL_FOR_ARCH7 | FL_THUMB_DIV)
++#define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM)
++#define FL_FOR_ARCH8A (FL_FOR_ARCH7VE | FL_ARCH8)
+ #define FL2_FOR_ARCH8_1A FL2_ARCH8_1
++#define FL2_FOR_ARCH8_2A (FL2_FOR_ARCH8_1A | FL2_ARCH8_2)
++#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | FL_THUMB_DIV)
++#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8)
+
+ /* There are too many feature bits to fit in a single word so the set of cpu and
+ fpu capabilities is a structure. A feature set is created and manipulated
+@@ -601,6 +613,9 @@ extern int arm_tune_cortex_a9;
+ interworking clean. */
+ extern int arm_cpp_interwork;
+
++/* Nonzero if chip supports Thumb 1. */
++extern int arm_arch_thumb1;
++
+ /* Nonzero if chip supports Thumb 2. */
+ extern int arm_arch_thumb2;
+
+--- a/src/gcc/config/arm/arm-tables.opt
++++ b/src/gcc/config/arm/arm-tables.opt
+@@ -322,6 +322,9 @@ EnumValue
+ Enum(processor_type) String(cortex-a72) Value(cortexa72)
+
+ EnumValue
++Enum(processor_type) String(cortex-a73) Value(cortexa73)
++
++EnumValue
+ Enum(processor_type) String(exynos-m1) Value(exynosm1)
+
+ EnumValue
+@@ -336,6 +339,12 @@ Enum(processor_type) String(cortex-a57.cortex-a53) Value(cortexa57cortexa53)
+ EnumValue
+ Enum(processor_type) String(cortex-a72.cortex-a53) Value(cortexa72cortexa53)
+
++EnumValue
++Enum(processor_type) String(cortex-a73.cortex-a35) Value(cortexa73cortexa35)
++
++EnumValue
++Enum(processor_type) String(cortex-a73.cortex-a53) Value(cortexa73cortexa53)
++
+ Enum
+ Name(arm_arch) Type(int)
+ Known ARM architectures (for use with the -march= option):
+@@ -428,10 +437,25 @@ EnumValue
+ Enum(arm_arch) String(armv8.1-a+crc) Value(28)
+
+ EnumValue
+-Enum(arm_arch) String(iwmmxt) Value(29)
++Enum(arm_arch) String(armv8.2-a) Value(29)
++
++EnumValue
++Enum(arm_arch) String(armv8.2-a+fp16) Value(30)
++
++EnumValue
++Enum(arm_arch) String(armv8-m.base) Value(31)
++
++EnumValue
++Enum(arm_arch) String(armv8-m.main) Value(32)
++
++EnumValue
++Enum(arm_arch) String(armv8-m.main+dsp) Value(33)
++
++EnumValue
++Enum(arm_arch) String(iwmmxt) Value(34)
+
+ EnumValue
+-Enum(arm_arch) String(iwmmxt2) Value(30)
++Enum(arm_arch) String(iwmmxt2) Value(35)
+
+ Enum
+ Name(arm_fpu) Type(int)
+--- a/src/gcc/config/arm/arm-tune.md
++++ b/src/gcc/config/arm/arm-tune.md
+@@ -34,6 +34,7 @@
+ cortexm3,marvell_pj4,cortexa15cortexa7,
+ cortexa17cortexa7,cortexa32,cortexa35,
+ cortexa53,cortexa57,cortexa72,
+- exynosm1,qdf24xx,xgene1,
+- cortexa57cortexa53,cortexa72cortexa53"
++ cortexa73,exynosm1,qdf24xx,
++ xgene1,cortexa57cortexa53,cortexa72cortexa53,
++ cortexa73cortexa35,cortexa73cortexa53"
+ (const (symbol_ref "((enum attr_tune) arm_tune)")))
+--- a/src/gcc/config/arm/arm.c
++++ b/src/gcc/config/arm/arm.c
+@@ -104,7 +104,6 @@ static void arm_print_operand_address (FILE *, machine_mode, rtx);
+ static bool arm_print_operand_punct_valid_p (unsigned char code);
+ static const char *fp_const_from_val (REAL_VALUE_TYPE *);
+ static arm_cc get_arm_condition_code (rtx);
+-static HOST_WIDE_INT int_log2 (HOST_WIDE_INT);
+ static const char *output_multi_immediate (rtx *, const char *, const char *,
+ int, HOST_WIDE_INT);
+ static const char *shift_op (rtx, HOST_WIDE_INT *);
+@@ -249,8 +248,6 @@ static void arm_output_dwarf_dtprel (FILE *, int, rtx) ATTRIBUTE_UNUSED;
+ static bool arm_output_addr_const_extra (FILE *, rtx);
+ static bool arm_allocate_stack_slots_for_args (void);
+ static bool arm_warn_func_return (tree);
+-static const char *arm_invalid_parameter_type (const_tree t);
+-static const char *arm_invalid_return_type (const_tree t);
+ static tree arm_promoted_type (const_tree t);
+ static tree arm_convert_to_type (tree type, tree expr);
+ static bool arm_scalar_mode_supported_p (machine_mode);
+@@ -300,6 +297,9 @@ static void arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
+ static unsigned HOST_WIDE_INT arm_asan_shadow_offset (void);
+
+ static void arm_sched_fusion_priority (rtx_insn *, int, int *, int*);
++static bool arm_can_output_mi_thunk (const_tree, HOST_WIDE_INT, HOST_WIDE_INT,
++ const_tree);
++
+
+ /* Table of machine attributes. */
+ static const struct attribute_spec arm_attribute_table[] =
+@@ -463,7 +463,7 @@ static const struct attribute_spec arm_attribute_table[] =
+ #undef TARGET_ASM_OUTPUT_MI_THUNK
+ #define TARGET_ASM_OUTPUT_MI_THUNK arm_output_mi_thunk
+ #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
+-#define TARGET_ASM_CAN_OUTPUT_MI_THUNK default_can_output_mi_thunk_no_vcall
++#define TARGET_ASM_CAN_OUTPUT_MI_THUNK arm_can_output_mi_thunk
+
+ #undef TARGET_RTX_COSTS
+ #define TARGET_RTX_COSTS arm_rtx_costs
+@@ -654,12 +654,6 @@ static const struct attribute_spec arm_attribute_table[] =
+ #undef TARGET_PREFERRED_RELOAD_CLASS
+ #define TARGET_PREFERRED_RELOAD_CLASS arm_preferred_reload_class
+
+-#undef TARGET_INVALID_PARAMETER_TYPE
+-#define TARGET_INVALID_PARAMETER_TYPE arm_invalid_parameter_type
+-
+-#undef TARGET_INVALID_RETURN_TYPE
+-#define TARGET_INVALID_RETURN_TYPE arm_invalid_return_type
+-
+ #undef TARGET_PROMOTED_TYPE
+ #define TARGET_PROMOTED_TYPE arm_promoted_type
+
+@@ -820,6 +814,13 @@ int arm_arch8 = 0;
+ /* Nonzero if this chip supports the ARMv8.1 extensions. */
+ int arm_arch8_1 = 0;
+
++/* Nonzero if this chip supports the ARM Architecture 8.2 extensions. */
++int arm_arch8_2 = 0;
++
++/* Nonzero if this chip supports the FP16 instructions extension of ARM
++ Architecture 8.2. */
++int arm_fp16_inst = 0;
++
+ /* Nonzero if this chip can benefit from load scheduling. */
+ int arm_ld_sched = 0;
+
+@@ -852,6 +853,9 @@ int arm_tune_cortex_a9 = 0;
+ interworking clean. */
+ int arm_cpp_interwork = 0;
+
++/* Nonzero if chip supports Thumb 1. */
++int arm_arch_thumb1;
++
+ /* Nonzero if chip supports Thumb 2. */
+ int arm_arch_thumb2;
+
+@@ -2055,6 +2059,29 @@ const struct tune_params arm_xgene1_tune =
+ tune_params::SCHED_AUTOPREF_OFF
+ };
+
++const struct tune_params arm_qdf24xx_tune =
++{
++ arm_9e_rtx_costs,
++ &qdf24xx_extra_costs,
++ NULL, /* Scheduler cost adjustment. */
++ arm_default_branch_cost,
++ &arm_default_vec_cost, /* Vectorizer costs. */
++ 1, /* Constant limit. */
++ 2, /* Max cond insns. */
++ 8, /* Memset max inline. */
++ 4, /* Issue rate. */
++ ARM_PREFETCH_BENEFICIAL (0, -1, 64),
++ tune_params::PREF_CONST_POOL_FALSE,
++ tune_params::PREF_LDRD_TRUE,
++ tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */
++ tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */
++ tune_params::DISPARAGE_FLAGS_ALL,
++ tune_params::PREF_NEON_64_FALSE,
++ tune_params::PREF_NEON_STRINGOPS_TRUE,
++ FUSE_OPS (tune_params::FUSE_MOVW_MOVT),
++ tune_params::SCHED_AUTOPREF_FULL
++};
++
+ /* Branches can be dual-issued on Cortex-A5, so conditional execution is
+ less appealing. Set max_insns_skipped to a low value. */
+
+@@ -2127,6 +2154,29 @@ const struct tune_params arm_cortex_a12_tune =
+ tune_params::SCHED_AUTOPREF_OFF
+ };
+
++const struct tune_params arm_cortex_a73_tune =
++{
++ arm_9e_rtx_costs,
++ &cortexa57_extra_costs,
++ NULL, /* Sched adj cost. */
++ arm_default_branch_cost,
++ &arm_default_vec_cost, /* Vectorizer costs. */
++ 1, /* Constant limit. */
++ 2, /* Max cond insns. */
++ 8, /* Memset max inline. */
++ 2, /* Issue rate. */
++ ARM_PREFETCH_NOT_BENEFICIAL,
++ tune_params::PREF_CONST_POOL_FALSE,
++ tune_params::PREF_LDRD_TRUE,
++ tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* Thumb. */
++ tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE, /* ARM. */
++ tune_params::DISPARAGE_FLAGS_ALL,
++ tune_params::PREF_NEON_64_FALSE,
++ tune_params::PREF_NEON_STRINGOPS_TRUE,
++ FUSE_OPS (tune_params::FUSE_AES_AESMC | tune_params::FUSE_MOVW_MOVT),
++ tune_params::SCHED_AUTOPREF_FULL
++};
++
+ /* armv7m tuning. On Cortex-M4 cores for example, MOVW/MOVT take a single
+ cycle to execute each. An LDR from the constant pool also takes two cycles
+ to execute, but mildly increases pipelining opportunity (consecutive
+@@ -2264,9 +2314,11 @@ static const struct processors *arm_selected_arch;
+ static const struct processors *arm_selected_cpu;
+ static const struct processors *arm_selected_tune;
+
+-/* The name of the preprocessor macro to define for this architecture. */
++/* The name of the preprocessor macro to define for this architecture. PROFILE
++ is replaced by the architecture name (eg. 8A) in arm_option_override () and
++ is thus chosen to be big enough to hold the longest architecture name. */
+
+-char arm_arch_name[] = "__ARM_ARCH_0UNK__";
++char arm_arch_name[] = "__ARM_ARCH_PROFILE__";
+
+ /* Available values for -mfpu=. */
+
+@@ -2907,7 +2959,8 @@ arm_option_override_internal (struct gcc_options *opts,
+ if (! opts_set->x_arm_restrict_it)
+ opts->x_arm_restrict_it = arm_arch8;
+
+- if (!TARGET_THUMB2_P (opts->x_target_flags))
++ /* ARM execution state and M profile don't have [restrict] IT. */
++ if (!TARGET_THUMB2_P (opts->x_target_flags) || !arm_arch_notm)
+ opts->x_arm_restrict_it = 0;
+
+ /* Enable -munaligned-access by default for
+@@ -2918,7 +2971,8 @@ arm_option_override_internal (struct gcc_options *opts,
+
+ Disable -munaligned-access by default for
+ - all pre-ARMv6 architecture-based processors
+- - ARMv6-M architecture-based processors. */
++ - ARMv6-M architecture-based processors
++ - ARMv8-M Baseline processors. */
+
+ if (! opts_set->x_unaligned_access)
+ {
+@@ -3170,6 +3224,8 @@ arm_option_override (void)
+ arm_arch7em = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH7EM);
+ arm_arch8 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH8);
+ arm_arch8_1 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_1);
++ arm_arch8_2 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_2);
++ arm_arch_thumb1 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB);
+ arm_arch_thumb2 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB2);
+ arm_arch_xscale = ARM_FSET_HAS_CPU1 (insn_flags, FL_XSCALE);
+
+@@ -3185,6 +3241,13 @@ arm_option_override (void)
+ arm_tune_cortex_a9 = (arm_tune == cortexa9) != 0;
+ arm_arch_crc = ARM_FSET_HAS_CPU1 (insn_flags, FL_CRC32);
+ arm_m_profile_small_mul = ARM_FSET_HAS_CPU1 (insn_flags, FL_SMALLMUL);
++ arm_fp16_inst = ARM_FSET_HAS_CPU2 (insn_flags, FL2_FP16INST);
++ if (arm_fp16_inst)
++ {
++ if (arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
++ error ("selected fp16 options are incompatible.");
++ arm_fp16_format = ARM_FP16_FORMAT_IEEE;
++ }
+
+ /* V5 code we generate is completely interworking capable, so we turn off
+ TARGET_INTERWORK here to avoid many tests later on. */
+@@ -3298,6 +3361,20 @@ arm_option_override (void)
+ }
+ }
+
++ if (TARGET_VXWORKS_RTP)
++ {
++ if (!global_options_set.x_arm_pic_data_is_text_relative)
++ arm_pic_data_is_text_relative = 0;
++ }
++ else if (flag_pic
++ && !arm_pic_data_is_text_relative
++ && !(global_options_set.x_target_flags & MASK_SINGLE_PIC_BASE))
++ /* When text & data segments don't have a fixed displacement, the
++ intended use is with a single, read only, pic base register.
++ Unless the user explicitly requested not to do that, set
++ it. */
++ target_flags |= MASK_SINGLE_PIC_BASE;
++
+ /* If stack checking is disabled, we can use r10 as the PIC register,
+ which keeps r9 available. The EABI specifies r9 as the PIC register. */
+ if (flag_pic && TARGET_SINGLE_PIC_BASE)
+@@ -3329,10 +3406,6 @@ arm_option_override (void)
+ arm_pic_register = pic_register;
+ }
+
+- if (TARGET_VXWORKS_RTP
+- && !global_options_set.x_arm_pic_data_is_text_relative)
+- arm_pic_data_is_text_relative = 0;
+-
+ /* Enable -mfix-cortex-m3-ldrd by default for Cortex-M3 cores. */
+ if (fix_cm3_ldrd == 2)
+ {
+@@ -3899,7 +3972,7 @@ const_ok_for_op (HOST_WIDE_INT i, enum rtx_code code)
+ {
+ case SET:
+ /* See if we can use movw. */
+- if (arm_arch_thumb2 && (i & 0xffff0000) == 0)
++ if (TARGET_HAVE_MOVT && (i & 0xffff0000) == 0)
+ return 1;
+ else
+ /* Otherwise, try mvn. */
+@@ -4118,7 +4191,7 @@ optimal_immediate_sequence (enum rtx_code code, unsigned HOST_WIDE_INT val,
+ yield a shorter sequence, we may as well use zero. */
+ insns1 = optimal_immediate_sequence_1 (code, val, return_sequence, best_start);
+ if (best_start != 0
+- && ((((unsigned HOST_WIDE_INT) 1) << best_start) < val))
++ && ((HOST_WIDE_INT_1U << best_start) < val))
+ {
+ insns2 = optimal_immediate_sequence_1 (code, val, &tmp_sequence, 0);
+ if (insns2 <= insns1)
+@@ -4949,7 +5022,7 @@ arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
+ if (mode == VOIDmode)
+ mode = GET_MODE (*op1);
+
+- maxval = (((unsigned HOST_WIDE_INT) 1) << (GET_MODE_BITSIZE(mode) - 1)) - 1;
++ maxval = (HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (mode) - 1)) - 1;
+
+ /* For DImode, we have GE/LT/GEU/LTU comparisons. In ARM mode
+ we can also use cmp/cmpeq for GTU/LEU. GT/LE must be either
+@@ -5549,7 +5622,7 @@ aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep)
+ {
+ case REAL_TYPE:
+ mode = TYPE_MODE (type);
+- if (mode != DFmode && mode != SFmode)
++ if (mode != DFmode && mode != SFmode && mode != HFmode)
+ return -1;
+
+ if (*modep == VOIDmode)
+@@ -5797,11 +5870,16 @@ aapcs_vfp_is_call_candidate (CUMULATIVE_ARGS *pcum, machine_mode mode,
+ &pcum->aapcs_vfp_rcount);
+ }
+
++/* Implement the allocate field in aapcs_cp_arg_layout. See the comment there
++ for the behaviour of this function. */
++
+ static bool
+ aapcs_vfp_allocate (CUMULATIVE_ARGS *pcum, machine_mode mode,
+ const_tree type ATTRIBUTE_UNUSED)
+ {
+- int shift = GET_MODE_SIZE (pcum->aapcs_vfp_rmode) / GET_MODE_SIZE (SFmode);
++ int rmode_size
++ = MAX (GET_MODE_SIZE (pcum->aapcs_vfp_rmode), GET_MODE_SIZE (SFmode));
++ int shift = rmode_size / GET_MODE_SIZE (SFmode);
+ unsigned mask = (1 << (shift * pcum->aapcs_vfp_rcount)) - 1;
+ int regno;
+
+@@ -5850,6 +5928,9 @@ aapcs_vfp_allocate (CUMULATIVE_ARGS *pcum, machine_mode mode,
+ return false;
+ }
+
++/* Implement the allocate_return_reg field in aapcs_cp_arg_layout. See the
++ comment there for the behaviour of this function. */
++
+ static rtx
+ aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant ATTRIBUTE_UNUSED,
+ machine_mode mode,
+@@ -5940,13 +6021,13 @@ static struct
+ required for a return from FUNCTION_ARG. */
+ bool (*allocate) (CUMULATIVE_ARGS *, machine_mode, const_tree);
+
+- /* Return true if a result of mode MODE (or type TYPE if MODE is
+- BLKmode) is can be returned in this co-processor's registers. */
++ /* Return true if a result of mode MODE (or type TYPE if MODE is BLKmode) can
++ be returned in this co-processor's registers. */
+ bool (*is_return_candidate) (enum arm_pcs, machine_mode, const_tree);
+
+- /* Allocate and return an RTX element to hold the return type of a
+- call, this routine must not fail and will only be called if
+- is_return_candidate returned true with the same parameters. */
++ /* Allocate and return an RTX element to hold the return type of a call. This
++ routine must not fail and will only be called if is_return_candidate
++ returned true with the same parameters. */
+ rtx (*allocate_return_reg) (enum arm_pcs, machine_mode, const_tree);
+
+ /* Finish processing this argument and prepare to start processing
+@@ -8214,6 +8295,12 @@ arm_legitimate_constant_p_1 (machine_mode, rtx x)
+ static bool
+ thumb_legitimate_constant_p (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
+ {
++ /* Splitters for TARGET_USE_MOVT call arm_emit_movpair which creates high
++ RTX. These RTX must therefore be allowed for Thumb-1 so that when run
++ for ARMv8-M Baseline or later the result is valid. */
++ if (TARGET_HAVE_MOVT && GET_CODE (x) == HIGH)
++ x = XEXP (x, 0);
++
+ return (CONST_INT_P (x)
+ || CONST_DOUBLE_P (x)
+ || CONSTANT_ADDRESS_P (x)
+@@ -8300,7 +8387,9 @@ thumb1_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
+ case CONST_INT:
+ if (outer == SET)
+ {
+- if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
++ if (UINTVAL (x) < 256
++ /* 16-bit constant. */
++ || (TARGET_HAVE_MOVT && !(INTVAL (x) & 0xffff0000)))
+ return 0;
+ if (thumb_shiftable_const (INTVAL (x)))
+ return COSTS_N_INSNS (2);
+@@ -8317,8 +8406,8 @@ thumb1_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
+ int i;
+ /* This duplicates the tests in the andsi3 expander. */
+ for (i = 9; i <= 31; i++)
+- if ((((HOST_WIDE_INT) 1) << i) - 1 == INTVAL (x)
+- || (((HOST_WIDE_INT) 1) << i) - 1 == ~INTVAL (x))
++ if ((HOST_WIDE_INT_1 << i) - 1 == INTVAL (x)
++ || (HOST_WIDE_INT_1 << i) - 1 == ~INTVAL (x))
+ return COSTS_N_INSNS (2);
+ }
+ else if (outer == ASHIFT || outer == ASHIFTRT
+@@ -9003,7 +9092,7 @@ static inline int
+ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
+ {
+ machine_mode mode = GET_MODE (x);
+- int words;
++ int words, cost;
+
+ switch (code)
+ {
+@@ -9049,17 +9138,27 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
+ /* A SET doesn't have a mode, so let's look at the SET_DEST to get
+ the mode. */
+ words = ARM_NUM_INTS (GET_MODE_SIZE (GET_MODE (SET_DEST (x))));
+- return COSTS_N_INSNS (words)
+- + COSTS_N_INSNS (1) * (satisfies_constraint_J (SET_SRC (x))
+- || satisfies_constraint_K (SET_SRC (x))
+- /* thumb1_movdi_insn. */
+- || ((words > 1) && MEM_P (SET_SRC (x))));
++ cost = COSTS_N_INSNS (words);
++ if (satisfies_constraint_J (SET_SRC (x))
++ || satisfies_constraint_K (SET_SRC (x))
++ /* Too big an immediate for a 2-byte mov, using MOVT. */
++ || (CONST_INT_P (SET_SRC (x))
++ && UINTVAL (SET_SRC (x)) >= 256
++ && TARGET_HAVE_MOVT
++ && satisfies_constraint_j (SET_SRC (x)))
++ /* thumb1_movdi_insn. */
++ || ((words > 1) && MEM_P (SET_SRC (x))))
++ cost += COSTS_N_INSNS (1);
++ return cost;
+
+ case CONST_INT:
+ if (outer == SET)
+ {
+- if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
++ if (UINTVAL (x) < 256)
+ return COSTS_N_INSNS (1);
++ /* movw is 4byte long. */
++ if (TARGET_HAVE_MOVT && !(INTVAL (x) & 0xffff0000))
++ return COSTS_N_INSNS (2);
+ /* See split "TARGET_THUMB1 && satisfies_constraint_J". */
+ if (INTVAL (x) >= -255 && INTVAL (x) <= -1)
+ return COSTS_N_INSNS (2);
+@@ -9079,8 +9178,8 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
+ int i;
+ /* This duplicates the tests in the andsi3 expander. */
+ for (i = 9; i <= 31; i++)
+- if ((((HOST_WIDE_INT) 1) << i) - 1 == INTVAL (x)
+- || (((HOST_WIDE_INT) 1) << i) - 1 == ~INTVAL (x))
++ if ((HOST_WIDE_INT_1 << i) - 1 == INTVAL (x)
++ || (HOST_WIDE_INT_1 << i) - 1 == ~INTVAL (x))
+ return COSTS_N_INSNS (2);
+ }
+ else if (outer == ASHIFT || outer == ASHIFTRT
+@@ -10759,8 +10858,6 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
+ if ((arm_arch4 || GET_MODE (XEXP (x, 0)) == SImode)
+ && MEM_P (XEXP (x, 0)))
+ {
+- *cost = rtx_cost (XEXP (x, 0), VOIDmode, code, 0, speed_p);
+-
+ if (mode == DImode)
+ *cost += COSTS_N_INSNS (1);
+
+@@ -12257,7 +12354,7 @@ vfp3_const_double_index (rtx x)
+
+ /* We can permit four significant bits of mantissa only, plus a high bit
+ which is always 1. */
+- mask = ((unsigned HOST_WIDE_INT)1 << (point_pos - 5)) - 1;
++ mask = (HOST_WIDE_INT_1U << (point_pos - 5)) - 1;
+ if ((mantissa & mask) != 0)
+ return -1;
+
+@@ -13139,7 +13236,7 @@ coproc_secondary_reload_class (machine_mode mode, rtx x, bool wb)
+ {
+ if (mode == HFmode)
+ {
+- if (!TARGET_NEON_FP16)
++ if (!TARGET_NEON_FP16 && !TARGET_VFP_FP16INST)
+ return GENERAL_REGS;
+ if (s_register_operand (x, mode) || neon_vector_mem_operand (x, 2, true))
+ return NO_REGS;
+@@ -15976,14 +16073,17 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
+ /* If the same input register is used in both stores
+ when storing different constants, try to find a free register.
+ For example, the code
+- mov r0, 0
+- str r0, [r2]
+- mov r0, 1
+- str r0, [r2, #4]
++ mov r0, 0
++ str r0, [r2]
++ mov r0, 1
++ str r0, [r2, #4]
+ can be transformed into
+- mov r1, 0
+- strd r1, r0, [r2]
+- in Thumb mode assuming that r1 is free. */
++ mov r1, 0
++ mov r0, 1
++ strd r1, r0, [r2]
++ in Thumb mode assuming that r1 is free.
++ For ARM mode do the same but only if the starting register
++ can be made to be even. */
+ if (const_store
+ && REGNO (operands[0]) == REGNO (operands[1])
+ && INTVAL (operands[4]) != INTVAL (operands[5]))
+@@ -16002,7 +16102,6 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
+ }
+ else if (TARGET_ARM)
+ {
+- return false;
+ int regno = REGNO (operands[0]);
+ if (!peep2_reg_dead_p (4, operands[0]))
+ {
+@@ -16356,7 +16455,7 @@ get_jump_table_size (rtx_jump_table_data *insn)
+ {
+ case 1:
+ /* Round up size of TBB table to a halfword boundary. */
+- size = (size + 1) & ~(HOST_WIDE_INT)1;
++ size = (size + 1) & ~HOST_WIDE_INT_1;
+ break;
+ case 2:
+ /* No padding necessary for TBH. */
+@@ -18588,6 +18687,8 @@ output_move_vfp (rtx *operands)
+ rtx reg, mem, addr, ops[2];
+ int load = REG_P (operands[0]);
+ int dp = GET_MODE_SIZE (GET_MODE (operands[0])) == 8;
++ int sp = (!TARGET_VFP_FP16INST
++ || GET_MODE_SIZE (GET_MODE (operands[0])) == 4);
+ int integer_p = GET_MODE_CLASS (GET_MODE (operands[0])) == MODE_INT;
+ const char *templ;
+ char buff[50];
+@@ -18600,8 +18701,10 @@ output_move_vfp (rtx *operands)
+
+ gcc_assert (REG_P (reg));
+ gcc_assert (IS_VFP_REGNUM (REGNO (reg)));
+- gcc_assert (mode == SFmode
++ gcc_assert ((mode == HFmode && TARGET_HARD_FLOAT && TARGET_VFP)
++ || mode == SFmode
+ || mode == DFmode
++ || mode == HImode
+ || mode == SImode
+ || mode == DImode
+ || (TARGET_NEON && VALID_NEON_DREG_MODE (mode)));
+@@ -18632,7 +18735,7 @@ output_move_vfp (rtx *operands)
+
+ sprintf (buff, templ,
+ load ? "ld" : "st",
+- dp ? "64" : "32",
++ dp ? "64" : sp ? "32" : "16",
+ dp ? "P" : "",
+ integer_p ? "\t%@ int" : "");
+ output_asm_insn (buff, ops);
+@@ -19058,7 +19161,8 @@ shift_op (rtx op, HOST_WIDE_INT *amountp)
+ return NULL;
+ }
+
+- *amountp = int_log2 (*amountp);
++ *amountp = exact_log2 (*amountp);
++ gcc_assert (IN_RANGE (*amountp, 0, 31));
+ return ARM_LSL_NAME;
+
+ default:
+@@ -19090,22 +19194,6 @@ shift_op (rtx op, HOST_WIDE_INT *amountp)
+ return mnem;
+ }
+
+-/* Obtain the shift from the POWER of two. */
+-
+-static HOST_WIDE_INT
+-int_log2 (HOST_WIDE_INT power)
+-{
+- HOST_WIDE_INT shift = 0;
+-
+- while ((((HOST_WIDE_INT) 1 << shift) & power) == 0)
+- {
+- gcc_assert (shift <= 31);
+- shift++;
+- }
+-
+- return shift;
+-}
+-
+ /* Output a .ascii pseudo-op, keeping track of lengths. This is
+ because /bin/as is horribly restrictive. The judgement about
+ whether or not each character is 'printable' (and can be output as
+@@ -22919,6 +23007,8 @@ maybe_get_arm_condition_code (rtx comparison)
+ {
+ case LTU: return ARM_CS;
+ case GEU: return ARM_CC;
++ case NE: return ARM_CS;
++ case EQ: return ARM_CC;
+ default: return ARM_NV;
+ }
+
+@@ -22944,6 +23034,14 @@ maybe_get_arm_condition_code (rtx comparison)
+ default: return ARM_NV;
+ }
+
++ case CC_Vmode:
++ switch (comp_code)
++ {
++ case NE: return ARM_VS;
++ case EQ: return ARM_VC;
++ default: return ARM_NV;
++ }
++
+ case CCmode:
+ switch (comp_code)
+ {
+@@ -23397,10 +23495,12 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
+ if (mode == DFmode)
+ return VFP_REGNO_OK_FOR_DOUBLE (regno);
+
+- /* VFP registers can hold HFmode values, but there is no point in
+- putting them there unless we have hardware conversion insns. */
+ if (mode == HFmode)
+- return TARGET_FP16 && VFP_REGNO_OK_FOR_SINGLE (regno);
++ return VFP_REGNO_OK_FOR_SINGLE (regno);
++
++ /* VFP registers can hold HImode values. */
++ if (mode == HImode)
++ return VFP_REGNO_OK_FOR_SINGLE (regno);
+
+ if (TARGET_NEON)
+ return (VALID_NEON_DREG_MODE (mode) && VFP_REGNO_OK_FOR_DOUBLE (regno))
+@@ -23604,26 +23704,6 @@ arm_debugger_arg_offset (int value, rtx addr)
+ return value;
+ }
+
+-/* Implement TARGET_INVALID_PARAMETER_TYPE. */
+-
+-static const char *
+-arm_invalid_parameter_type (const_tree t)
+-{
+- if (SCALAR_FLOAT_TYPE_P (t) && TYPE_PRECISION (t) == 16)
+- return N_("function parameters cannot have __fp16 type");
+- return NULL;
+-}
+-
+-/* Implement TARGET_INVALID_PARAMETER_TYPE. */
+-
+-static const char *
+-arm_invalid_return_type (const_tree t)
+-{
+- if (SCALAR_FLOAT_TYPE_P (t) && TYPE_PRECISION (t) == 16)
+- return N_("functions cannot return __fp16 type");
+- return NULL;
+-}
+-
+ /* Implement TARGET_PROMOTED_TYPE. */
+
+ static tree
+@@ -25847,13 +25927,6 @@ thumb_reload_out_hi (rtx *operands)
+ emit_insn (gen_thumb_movhi_clobber (operands[0], operands[1], operands[2]));
+ }
+
+-/* Handle reading a half-word from memory during reload. */
+-void
+-thumb_reload_in_hi (rtx *operands ATTRIBUTE_UNUSED)
+-{
+- gcc_unreachable ();
+-}
+-
+ /* Return the length of a function name prefix
+ that starts with the character 'c'. */
+ static int
+@@ -25991,7 +26064,7 @@ arm_file_start (void)
+ const char* pos = strchr (arm_selected_arch->name, '+');
+ if (pos)
+ {
+- char buf[15];
++ char buf[32];
+ gcc_assert (strlen (arm_selected_arch->name)
+ <= sizeof (buf) / sizeof (*pos));
+ strncpy (buf, arm_selected_arch->name,
+@@ -26133,11 +26206,10 @@ arm_internal_label (FILE *stream, const char *prefix, unsigned long labelno)
+
+ /* Output code to add DELTA to the first argument, and then jump
+ to FUNCTION. Used for C++ multiple inheritance. */
++
+ static void
+-arm_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
+- HOST_WIDE_INT delta,
+- HOST_WIDE_INT vcall_offset ATTRIBUTE_UNUSED,
+- tree function)
++arm_thumb1_mi_thunk (FILE *file, tree, HOST_WIDE_INT delta,
++ HOST_WIDE_INT, tree function)
+ {
+ static int thunk_label = 0;
+ char label[256];
+@@ -26278,6 +26350,76 @@ arm_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
+ final_end_function ();
+ }
+
++/* MI thunk handling for TARGET_32BIT. */
++
++static void
++arm32_output_mi_thunk (FILE *file, tree, HOST_WIDE_INT delta,
++ HOST_WIDE_INT vcall_offset, tree function)
++{
++ /* On ARM, this_regno is R0 or R1 depending on
++ whether the function returns an aggregate or not.
++ */
++ int this_regno = (aggregate_value_p (TREE_TYPE (TREE_TYPE (function)),
++ function)
++ ? R1_REGNUM : R0_REGNUM);
++
++ rtx temp = gen_rtx_REG (Pmode, IP_REGNUM);
++ rtx this_rtx = gen_rtx_REG (Pmode, this_regno);
++ reload_completed = 1;
++ emit_note (NOTE_INSN_PROLOGUE_END);
++
++ /* Add DELTA to THIS_RTX. */
++ if (delta != 0)
++ arm_split_constant (PLUS, Pmode, NULL_RTX,
++ delta, this_rtx, this_rtx, false);
++
++ /* Add *(*THIS_RTX + VCALL_OFFSET) to THIS_RTX. */
++ if (vcall_offset != 0)
++ {
++ /* Load *THIS_RTX. */
++ emit_move_insn (temp, gen_rtx_MEM (Pmode, this_rtx));
++ /* Compute *THIS_RTX + VCALL_OFFSET. */
++ arm_split_constant (PLUS, Pmode, NULL_RTX, vcall_offset, temp, temp,
++ false);
++ /* Compute *(*THIS_RTX + VCALL_OFFSET). */
++ emit_move_insn (temp, gen_rtx_MEM (Pmode, temp));
++ emit_insn (gen_add3_insn (this_rtx, this_rtx, temp));
++ }
++
++ /* Generate a tail call to the target function. */
++ if (!TREE_USED (function))
++ {
++ assemble_external (function);
++ TREE_USED (function) = 1;
++ }
++ rtx funexp = XEXP (DECL_RTL (function), 0);
++ funexp = gen_rtx_MEM (FUNCTION_MODE, funexp);
++ rtx_insn * insn = emit_call_insn (gen_sibcall (funexp, const0_rtx, NULL_RTX));
++ SIBLING_CALL_P (insn) = 1;
++
++ insn = get_insns ();
++ shorten_branches (insn);
++ final_start_function (insn, file, 1);
++ final (insn, file, 1);
++ final_end_function ();
++
++ /* Stop pretending this is a post-reload pass. */
++ reload_completed = 0;
++}
++
++/* Output code to add DELTA to the first argument, and then jump
++ to FUNCTION. Used for C++ multiple inheritance. */
++
++static void
++arm_output_mi_thunk (FILE *file, tree thunk, HOST_WIDE_INT delta,
++ HOST_WIDE_INT vcall_offset, tree function)
++{
++ if (TARGET_32BIT)
++ arm32_output_mi_thunk (file, thunk, delta, vcall_offset, function);
++ else
++ arm_thumb1_mi_thunk (file, thunk, delta, vcall_offset, function);
++}
++
+ int
+ arm_emit_vector_const (FILE *file, rtx x)
+ {
+@@ -27733,7 +27875,7 @@ arm_preferred_rename_class (reg_class_t rclass)
+ return NO_REGS;
+ }
+
+-/* Compute the atrribute "length" of insn "*push_multi".
++/* Compute the attribute "length" of insn "*push_multi".
+ So this function MUST be kept in sync with that insn pattern. */
+ int
+ arm_attr_length_push_multi(rtx parallel_op, rtx first_op)
+@@ -27750,6 +27892,11 @@ arm_attr_length_push_multi(rtx parallel_op, rtx first_op)
+
+ /* Thumb2 mode. */
+ regno = REGNO (first_op);
++ /* For PUSH/STM under Thumb2 mode, we can use 16-bit encodings if the register
++ list is 8-bit. Normally this means all registers in the list must be
++ LO_REGS, that is (R0 -R7). If any HI_REGS used, then we must use 32-bit
++ encodings. There is one exception for PUSH that LR in HI_REGS can be used
++ with 16-bit encoding. */
+ hi_reg = (REGNO_REG_CLASS (regno) == HI_REGS) && (regno != LR_REGNUM);
+ for (i = 1; i < num_saves && !hi_reg; i++)
+ {
+@@ -27762,6 +27909,56 @@ arm_attr_length_push_multi(rtx parallel_op, rtx first_op)
+ return 4;
+ }
+
++/* Compute the attribute "length" of insn. Currently, this function is used
++ for "*load_multiple_with_writeback", "*pop_multiple_with_return" and
++ "*pop_multiple_with_writeback_and_return". OPERANDS is the toplevel PARALLEL
++ rtx, RETURN_PC is true if OPERANDS contains return insn. WRITE_BACK_P is
++ true if OPERANDS contains insn which explicit updates base register. */
++
++int
++arm_attr_length_pop_multi (rtx *operands, bool return_pc, bool write_back_p)
++{
++ /* ARM mode. */
++ if (TARGET_ARM)
++ return 4;
++ /* Thumb1 mode. */
++ if (TARGET_THUMB1)
++ return 2;
++
++ rtx parallel_op = operands[0];
++ /* Initialize to elements number of PARALLEL. */
++ unsigned indx = XVECLEN (parallel_op, 0) - 1;
++ /* Initialize the value to base register. */
++ unsigned regno = REGNO (operands[1]);
++ /* Skip return and write back pattern.
++ We only need register pop pattern for later analysis. */
++ unsigned first_indx = 0;
++ first_indx += return_pc ? 1 : 0;
++ first_indx += write_back_p ? 1 : 0;
++
++ /* A pop operation can be done through LDM or POP. If the base register is SP
++ and if it's with write back, then a LDM will be alias of POP. */
++ bool pop_p = (regno == SP_REGNUM && write_back_p);
++ bool ldm_p = !pop_p;
++
++ /* Check base register for LDM. */
++ if (ldm_p && REGNO_REG_CLASS (regno) == HI_REGS)
++ return 4;
++
++ /* Check each register in the list. */
++ for (; indx >= first_indx; indx--)
++ {
++ regno = REGNO (XEXP (XVECEXP (parallel_op, 0, indx), 0));
++ /* For POP, PC in HI_REGS can be used with 16-bit encoding. See similar
++ comment in arm_attr_length_push_multi. */
++ if (REGNO_REG_CLASS (regno) == HI_REGS
++ && (regno != PC_REGNUM || ldm_p))
++ return 4;
++ }
++
++ return 2;
++}
++
+ /* Compute the number of instructions emitted by output_move_double. */
+ int
+ arm_count_output_move_double_insns (rtx *operands)
+@@ -27793,7 +27990,11 @@ vfp3_const_double_for_fract_bits (rtx operand)
+ HOST_WIDE_INT value = real_to_integer (&r0);
+ value = value & 0xffffffff;
+ if ((value != 0) && ( (value & (value - 1)) == 0))
+- return int_log2 (value);
++ {
++ int ret = exact_log2 (value);
++ gcc_assert (IN_RANGE (ret, 0, 31));
++ return ret;
++ }
+ }
+ }
+ return 0;
+@@ -28350,6 +28551,8 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
+ case V8QImode: gen = gen_neon_vuzpv8qi_internal; break;
+ case V8HImode: gen = gen_neon_vuzpv8hi_internal; break;
+ case V4HImode: gen = gen_neon_vuzpv4hi_internal; break;
++ case V8HFmode: gen = gen_neon_vuzpv8hf_internal; break;
++ case V4HFmode: gen = gen_neon_vuzpv4hf_internal; break;
+ case V4SImode: gen = gen_neon_vuzpv4si_internal; break;
+ case V2SImode: gen = gen_neon_vuzpv2si_internal; break;
+ case V2SFmode: gen = gen_neon_vuzpv2sf_internal; break;
+@@ -28423,6 +28626,8 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
+ case V8QImode: gen = gen_neon_vzipv8qi_internal; break;
+ case V8HImode: gen = gen_neon_vzipv8hi_internal; break;
+ case V4HImode: gen = gen_neon_vzipv4hi_internal; break;
++ case V8HFmode: gen = gen_neon_vzipv8hf_internal; break;
++ case V4HFmode: gen = gen_neon_vzipv4hf_internal; break;
+ case V4SImode: gen = gen_neon_vzipv4si_internal; break;
+ case V2SImode: gen = gen_neon_vzipv2si_internal; break;
+ case V2SFmode: gen = gen_neon_vzipv2sf_internal; break;
+@@ -28475,6 +28680,8 @@ arm_evpc_neon_vrev (struct expand_vec_perm_d *d)
+ case V8QImode: gen = gen_neon_vrev32v8qi; break;
+ case V8HImode: gen = gen_neon_vrev64v8hi; break;
+ case V4HImode: gen = gen_neon_vrev64v4hi; break;
++ case V8HFmode: gen = gen_neon_vrev64v8hf; break;
++ case V4HFmode: gen = gen_neon_vrev64v4hf; break;
+ default:
+ return false;
+ }
+@@ -28558,6 +28765,8 @@ arm_evpc_neon_vtrn (struct expand_vec_perm_d *d)
+ case V8QImode: gen = gen_neon_vtrnv8qi_internal; break;
+ case V8HImode: gen = gen_neon_vtrnv8hi_internal; break;
+ case V4HImode: gen = gen_neon_vtrnv4hi_internal; break;
++ case V8HFmode: gen = gen_neon_vtrnv8hf_internal; break;
++ case V4HFmode: gen = gen_neon_vtrnv4hf_internal; break;
+ case V4SImode: gen = gen_neon_vtrnv4si_internal; break;
+ case V2SImode: gen = gen_neon_vtrnv2si_internal; break;
+ case V2SFmode: gen = gen_neon_vtrnv2sf_internal; break;
+@@ -28633,6 +28842,8 @@ arm_evpc_neon_vext (struct expand_vec_perm_d *d)
+ case V8HImode: gen = gen_neon_vextv8hi; break;
+ case V2SImode: gen = gen_neon_vextv2si; break;
+ case V4SImode: gen = gen_neon_vextv4si; break;
++ case V4HFmode: gen = gen_neon_vextv4hf; break;
++ case V8HFmode: gen = gen_neon_vextv8hf; break;
+ case V2SFmode: gen = gen_neon_vextv2sf; break;
+ case V4SFmode: gen = gen_neon_vextv4sf; break;
+ case V2DImode: gen = gen_neon_vextv2di; break;
+@@ -29158,7 +29369,7 @@ arm_validize_comparison (rtx *comparison, rtx * op1, rtx * op2)
+ {
+ enum rtx_code code = GET_CODE (*comparison);
+ int code_int;
+- machine_mode mode = (GET_MODE (*op1) == VOIDmode)
++ machine_mode mode = (GET_MODE (*op1) == VOIDmode)
+ ? GET_MODE (*op2) : GET_MODE (*op1);
+
+ gcc_assert (GET_MODE (*op1) != VOIDmode || GET_MODE (*op2) != VOIDmode);
+@@ -29186,6 +29397,14 @@ arm_validize_comparison (rtx *comparison, rtx * op1, rtx * op2)
+ *op2 = force_reg (mode, *op2);
+ return true;
+
++ case HFmode:
++ if (!TARGET_VFP_FP16INST)
++ break;
++ /* FP16 comparisons are done in SF mode. */
++ mode = SFmode;
++ *op1 = convert_to_mode (mode, *op1, 1);
++ *op2 = convert_to_mode (mode, *op2, 1);
++ /* Fall through. */
+ case SFmode:
+ case DFmode:
+ if (!arm_float_compare_operand (*op1, mode))
+@@ -29732,11 +29951,57 @@ arm_macro_fusion_p (void)
+ return current_tune->fusible_ops != tune_params::FUSE_NOTHING;
+ }
+
++/* Return true if the two back-to-back sets PREV_SET, CURR_SET are suitable
++ for MOVW / MOVT macro fusion. */
++
++static bool
++arm_sets_movw_movt_fusible_p (rtx prev_set, rtx curr_set)
++{
++ /* We are trying to fuse
++ movw imm / movt imm
++ instructions as a group that gets scheduled together. */
++
++ rtx set_dest = SET_DEST (curr_set);
++
++ if (GET_MODE (set_dest) != SImode)
++ return false;
++
++ /* We are trying to match:
++ prev (movw) == (set (reg r0) (const_int imm16))
++ curr (movt) == (set (zero_extract (reg r0)
++ (const_int 16)
++ (const_int 16))
++ (const_int imm16_1))
++ or
++ prev (movw) == (set (reg r1)
++ (high (symbol_ref ("SYM"))))
++ curr (movt) == (set (reg r0)
++ (lo_sum (reg r1)
++ (symbol_ref ("SYM")))) */
++
++ if (GET_CODE (set_dest) == ZERO_EXTRACT)
++ {
++ if (CONST_INT_P (SET_SRC (curr_set))
++ && CONST_INT_P (SET_SRC (prev_set))
++ && REG_P (XEXP (set_dest, 0))
++ && REG_P (SET_DEST (prev_set))
++ && REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
++ return true;
++
++ }
++ else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
++ && REG_P (SET_DEST (curr_set))
++ && REG_P (SET_DEST (prev_set))
++ && GET_CODE (SET_SRC (prev_set)) == HIGH
++ && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
++ return true;
++
++ return false;
++}
+
+ static bool
+ aarch_macro_fusion_pair_p (rtx_insn* prev, rtx_insn* curr)
+ {
+- rtx set_dest;
+ rtx prev_set = single_set (prev);
+ rtx curr_set = single_set (curr);
+
+@@ -29754,54 +30019,26 @@ aarch_macro_fusion_pair_p (rtx_insn* prev, rtx_insn* curr)
+ && aarch_crypto_can_dual_issue (prev, curr))
+ return true;
+
+- if (current_tune->fusible_ops & tune_params::FUSE_MOVW_MOVT)
+- {
+- /* We are trying to fuse
+- movw imm / movt imm
+- instructions as a group that gets scheduled together. */
+-
+- set_dest = SET_DEST (curr_set);
+-
+- if (GET_MODE (set_dest) != SImode)
+- return false;
++ if (current_tune->fusible_ops & tune_params::FUSE_MOVW_MOVT
++ && arm_sets_movw_movt_fusible_p (prev_set, curr_set))
++ return true;
+
+- /* We are trying to match:
+- prev (movw) == (set (reg r0) (const_int imm16))
+- curr (movt) == (set (zero_extract (reg r0)
+- (const_int 16)
+- (const_int 16))
+- (const_int imm16_1))
+- or
+- prev (movw) == (set (reg r1)
+- (high (symbol_ref ("SYM"))))
+- curr (movt) == (set (reg r0)
+- (lo_sum (reg r1)
+- (symbol_ref ("SYM")))) */
+- if (GET_CODE (set_dest) == ZERO_EXTRACT)
+- {
+- if (CONST_INT_P (SET_SRC (curr_set))
+- && CONST_INT_P (SET_SRC (prev_set))
+- && REG_P (XEXP (set_dest, 0))
+- && REG_P (SET_DEST (prev_set))
+- && REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
+- return true;
+- }
+- else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
+- && REG_P (SET_DEST (curr_set))
+- && REG_P (SET_DEST (prev_set))
+- && GET_CODE (SET_SRC (prev_set)) == HIGH
+- && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
+- return true;
+- }
+ return false;
+ }
+
++/* Return true iff the instruction fusion described by OP is enabled. */
++bool
++arm_fusion_enabled_p (tune_params::fuse_ops op)
++{
++ return current_tune->fusible_ops & op;
++}
++
+ /* Implement the TARGET_ASAN_SHADOW_OFFSET hook. */
+
+ static unsigned HOST_WIDE_INT
+ arm_asan_shadow_offset (void)
+ {
+- return (unsigned HOST_WIDE_INT) 1 << 29;
++ return HOST_WIDE_INT_1U << 29;
+ }
+
+
+@@ -30306,4 +30543,113 @@ arm_sched_fusion_priority (rtx_insn *insn, int max_pri,
+ return;
+ }
+
++
++/* Construct and return a PARALLEL RTX vector with elements numbering the
++ lanes of either the high (HIGH == TRUE) or low (HIGH == FALSE) half of
++ the vector - from the perspective of the architecture. This does not
++ line up with GCC's perspective on lane numbers, so we end up with
++ different masks depending on our target endian-ness. The diagram
++ below may help. We must draw the distinction when building masks
++ which select one half of the vector. An instruction selecting
++ architectural low-lanes for a big-endian target, must be described using
++ a mask selecting GCC high-lanes.
++
++ Big-Endian Little-Endian
++
++GCC 0 1 2 3 3 2 1 0
++ | x | x | x | x | | x | x | x | x |
++Architecture 3 2 1 0 3 2 1 0
++
++Low Mask: { 2, 3 } { 0, 1 }
++High Mask: { 0, 1 } { 2, 3 }
++*/
++
++rtx
++arm_simd_vect_par_cnst_half (machine_mode mode, bool high)
++{
++ int nunits = GET_MODE_NUNITS (mode);
++ rtvec v = rtvec_alloc (nunits / 2);
++ int high_base = nunits / 2;
++ int low_base = 0;
++ int base;
++ rtx t1;
++ int i;
++
++ if (BYTES_BIG_ENDIAN)
++ base = high ? low_base : high_base;
++ else
++ base = high ? high_base : low_base;
++
++ for (i = 0; i < nunits / 2; i++)
++ RTVEC_ELT (v, i) = GEN_INT (base + i);
++
++ t1 = gen_rtx_PARALLEL (mode, v);
++ return t1;
++}
++
++/* Check OP for validity as a PARALLEL RTX vector with elements
++ numbering the lanes of either the high (HIGH == TRUE) or low lanes,
++ from the perspective of the architecture. See the diagram above
++ arm_simd_vect_par_cnst_half_p for more details. */
++
++bool
++arm_simd_check_vect_par_cnst_half_p (rtx op, machine_mode mode,
++ bool high)
++{
++ rtx ideal = arm_simd_vect_par_cnst_half (mode, high);
++ HOST_WIDE_INT count_op = XVECLEN (op, 0);
++ HOST_WIDE_INT count_ideal = XVECLEN (ideal, 0);
++ int i = 0;
++
++ if (!VECTOR_MODE_P (mode))
++ return false;
++
++ if (count_op != count_ideal)
++ return false;
++
++ for (i = 0; i < count_ideal; i++)
++ {
++ rtx elt_op = XVECEXP (op, 0, i);
++ rtx elt_ideal = XVECEXP (ideal, 0, i);
++
++ if (!CONST_INT_P (elt_op)
++ || INTVAL (elt_ideal) != INTVAL (elt_op))
++ return false;
++ }
++ return true;
++}
++
++/* Can output mi_thunk for all cases except for non-zero vcall_offset
++ in Thumb1. */
++static bool
++arm_can_output_mi_thunk (const_tree, HOST_WIDE_INT, HOST_WIDE_INT vcall_offset,
++ const_tree)
++{
++ /* For now, we punt and not handle this for TARGET_THUMB1. */
++ if (vcall_offset && TARGET_THUMB1)
++ return false;
++
++ /* Otherwise ok. */
++ return true;
++}
++
++/* Generate RTL for a conditional branch with rtx comparison CODE in
++ mode CC_MODE. The destination of the unlikely conditional branch
++ is LABEL_REF. */
++
++void
++arm_gen_unlikely_cbranch (enum rtx_code code, machine_mode cc_mode,
++ rtx label_ref)
++{
++ rtx x;
++ x = gen_rtx_fmt_ee (code, VOIDmode,
++ gen_rtx_REG (cc_mode, CC_REGNUM),
++ const0_rtx);
++
++ x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
++ gen_rtx_LABEL_REF (VOIDmode, label_ref),
++ pc_rtx);
++ emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
++}
++
+ #include "gt-arm.h"
+--- a/src/gcc/config/arm/arm.h
++++ b/src/gcc/config/arm/arm.h
+@@ -80,11 +80,6 @@ extern arm_cc arm_current_cc;
+ extern int arm_target_label;
+ extern int arm_ccfsm_state;
+ extern GTY(()) rtx arm_target_insn;
+-/* The label of the current constant pool. */
+-extern rtx pool_vector_label;
+-/* Set to 1 when a return insn is output, this means that the epilogue
+- is not needed. */
+-extern int return_used_this_function;
+ /* Callback to output language specific object attributes. */
+ extern void (*arm_lang_output_object_attributes_hook)(void);
+
+@@ -194,7 +189,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
+ /* FPU supports half-precision floating-point with NEON element load/store. */
+ #define TARGET_NEON_FP16 \
+ (TARGET_VFP \
+- && ARM_FPU_FSET_HAS (TARGET_FPU_FEATURES, FPU_FL_NEON | FPU_FL_FP16))
++ && ARM_FPU_FSET_HAS (TARGET_FPU_FEATURES, FPU_FL_NEON) \
++ && ARM_FPU_FSET_HAS (TARGET_FPU_FEATURES, FPU_FL_FP16))
+
+ /* FPU supports VFP half-precision floating-point. */
+ #define TARGET_FP16 \
+@@ -221,6 +217,13 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
+ /* FPU supports ARMv8.1 Adv.SIMD extensions. */
+ #define TARGET_NEON_RDMA (TARGET_NEON && arm_arch8_1)
+
++/* FPU supports the floating point FP16 instructions for ARMv8.2 and later. */
++#define TARGET_VFP_FP16INST \
++ (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FPU_ARMV8 && arm_fp16_inst)
++
++/* FPU supports the AdvSIMD FP16 instructions for ARMv8.2 and later. */
++#define TARGET_NEON_FP16INST (TARGET_VFP_FP16INST && TARGET_NEON_RDMA)
++
+ /* Q-bit is present. */
+ #define TARGET_ARM_QBIT \
+ (TARGET_32BIT && arm_arch5e && (arm_arch_notm || arm_arch7))
+@@ -236,7 +239,7 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
+
+ /* Should MOVW/MOVT be used in preference to a constant pool. */
+ #define TARGET_USE_MOVT \
+- (arm_arch_thumb2 \
++ (TARGET_HAVE_MOVT \
+ && (arm_disable_literal_pool \
+ || (!optimize_size && !current_tune->prefer_constant_pool)))
+
+@@ -265,11 +268,22 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
+ || arm_arch7) && arm_arch_notm)
+
+ /* Nonzero if this chip supports load-acquire and store-release. */
+-#define TARGET_HAVE_LDACQ (TARGET_ARM_ARCH >= 8)
++#define TARGET_HAVE_LDACQ (TARGET_ARM_ARCH >= 8 && TARGET_32BIT)
++
++/* Nonzero if this chip supports LDAEXD and STLEXD. */
++#define TARGET_HAVE_LDACQEXD (TARGET_ARM_ARCH >= 8 \
++ && TARGET_32BIT \
++ && arm_arch_notm)
++
++/* Nonzero if this chip provides the MOVW and MOVT instructions. */
++#define TARGET_HAVE_MOVT (arm_arch_thumb2 || arm_arch8)
++
++/* Nonzero if this chip provides the CBZ and CBNZ instructions. */
++#define TARGET_HAVE_CBZ (arm_arch_thumb2 || arm_arch8)
+
+ /* Nonzero if integer division instructions supported. */
+ #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \
+- || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
++ || (TARGET_THUMB && arm_arch_thumb_hwdiv))
+
+ /* Nonzero if disallow volatile memory access in IT block. */
+ #define TARGET_NO_VOLATILE_CE (arm_arch_no_volatile_ce)
+@@ -402,7 +416,9 @@ enum base_architecture
+ BASE_ARCH_7R = 7,
+ BASE_ARCH_7M = 7,
+ BASE_ARCH_7EM = 7,
+- BASE_ARCH_8A = 8
++ BASE_ARCH_8A = 8,
++ BASE_ARCH_8M_BASE = 8,
++ BASE_ARCH_8M_MAIN = 8
+ };
+
+ /* The major revision number of the ARM Architecture implemented by the target. */
+@@ -447,6 +463,13 @@ extern int arm_arch8;
+ /* Nonzero if this chip supports the ARM Architecture 8.1 extensions. */
+ extern int arm_arch8_1;
+
++/* Nonzero if this chip supports the ARM Architecture 8.2 extensions. */
++extern int arm_arch8_2;
++
++/* Nonzero if this chip supports the FP16 instructions extension of ARM
++ Architecture 8.2. */
++extern int arm_fp16_inst;
++
+ /* Nonzero if this chip can benefit from load scheduling. */
+ extern int arm_ld_sched;
+
+@@ -478,6 +501,9 @@ extern int arm_tune_cortex_a9;
+ interworking clean. */
+ extern int arm_cpp_interwork;
+
++/* Nonzero if chip supports Thumb 1. */
++extern int arm_arch_thumb1;
++
+ /* Nonzero if chip supports Thumb 2. */
+ extern int arm_arch_thumb2;
+
+@@ -2187,13 +2213,9 @@ extern int making_const_table;
+ #define TARGET_ARM_ARCH \
+ (arm_base_arch) \
+
+-#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
+-#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
+-
+ /* The highest Thumb instruction set version supported by the chip. */
+-#define TARGET_ARM_ARCH_ISA_THUMB \
+- (arm_arch_thumb2 ? 2 \
+- : ((TARGET_ARM_ARCH >= 5 || arm_arch4t) ? 1 : 0))
++#define TARGET_ARM_ARCH_ISA_THUMB \
++ (arm_arch_thumb2 ? 2 : (arm_arch_thumb1 ? 1 : 0))
+
+ /* Expands to an upper-case char of the target's architectural
+ profile. */
+--- a/src/gcc/config/arm/arm.md
++++ b/src/gcc/config/arm/arm.md
+@@ -118,10 +118,10 @@
+ ; This can be "a" for ARM, "t" for either of the Thumbs, "32" for
+ ; TARGET_32BIT, "t1" or "t2" to specify a specific Thumb mode. "v6"
+ ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
+-; arm_arch6. "v6t2" for Thumb-2 with arm_arch6. This attribute is
+-; used to compute attribute "enabled", use type "any" to enable an
+-; alternative in all cases.
+-(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,v6t2,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3"
++; arm_arch6. "v6t2" for Thumb-2 with arm_arch6 and "v8mb" for ARMv8-M
++; Baseline. This attribute is used to compute attribute "enabled",
++; use type "any" to enable an alternative in all cases.
++(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,v6t2,v8mb,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3,neon"
+ (const_string "any"))
+
+ (define_attr "arch_enabled" "no,yes"
+@@ -160,6 +160,10 @@
+ (match_test "TARGET_32BIT && arm_arch6 && arm_arch_thumb2"))
+ (const_string "yes")
+
++ (and (eq_attr "arch" "v8mb")
++ (match_test "TARGET_THUMB1 && arm_arch8"))
++ (const_string "yes")
++
+ (and (eq_attr "arch" "avoid_neon_for_64bits")
+ (match_test "TARGET_NEON")
+ (not (match_test "TARGET_PREFER_NEON_64BITS")))
+@@ -177,6 +181,10 @@
+ (and (eq_attr "arch" "armv6_or_vfpv3")
+ (match_test "arm_arch6 || TARGET_VFP3"))
+ (const_string "yes")
++
++ (and (eq_attr "arch" "neon")
++ (match_test "TARGET_NEON"))
++ (const_string "yes")
+ ]
+
+ (const_string "no")))
+@@ -539,6 +547,32 @@
+ (set_attr "type" "multiple")]
+ )
+
++(define_expand "addv<mode>4"
++ [(match_operand:SIDI 0 "register_operand")
++ (match_operand:SIDI 1 "register_operand")
++ (match_operand:SIDI 2 "register_operand")
++ (match_operand 3 "")]
++ "TARGET_32BIT"
++{
++ emit_insn (gen_add<mode>3_compareV (operands[0], operands[1], operands[2]));
++ arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
++
++ DONE;
++})
++
++(define_expand "uaddv<mode>4"
++ [(match_operand:SIDI 0 "register_operand")
++ (match_operand:SIDI 1 "register_operand")
++ (match_operand:SIDI 2 "register_operand")
++ (match_operand 3 "")]
++ "TARGET_32BIT"
++{
++ emit_insn (gen_add<mode>3_compareC (operands[0], operands[1], operands[2]));
++ arm_gen_unlikely_cbranch (NE, CC_Cmode, operands[3]);
++
++ DONE;
++})
++
+ (define_expand "addsi3"
+ [(set (match_operand:SI 0 "s_register_operand" "")
+ (plus:SI (match_operand:SI 1 "s_register_operand" "")
+@@ -616,6 +650,165 @@
+ ]
+ )
+
++(define_insn_and_split "adddi3_compareV"
++ [(set (reg:CC_V CC_REGNUM)
++ (ne:CC_V
++ (plus:TI
++ (sign_extend:TI (match_operand:DI 1 "register_operand" "r"))
++ (sign_extend:TI (match_operand:DI 2 "register_operand" "r")))
++ (sign_extend:TI (plus:DI (match_dup 1) (match_dup 2)))))
++ (set (match_operand:DI 0 "register_operand" "=&r")
++ (plus:DI (match_dup 1) (match_dup 2)))]
++ "TARGET_32BIT"
++ "#"
++ "&& reload_completed"
++ [(parallel [(set (reg:CC_C CC_REGNUM)
++ (compare:CC_C (plus:SI (match_dup 1) (match_dup 2))
++ (match_dup 1)))
++ (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
++ (parallel [(set (reg:CC_V CC_REGNUM)
++ (ne:CC_V
++ (plus:DI (plus:DI
++ (sign_extend:DI (match_dup 4))
++ (sign_extend:DI (match_dup 5)))
++ (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
++ (plus:DI (sign_extend:DI
++ (plus:SI (match_dup 4) (match_dup 5)))
++ (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
++ (set (match_dup 3) (plus:SI (plus:SI
++ (match_dup 4) (match_dup 5))
++ (ltu:SI (reg:CC_C CC_REGNUM)
++ (const_int 0))))])]
++ "
++ {
++ operands[3] = gen_highpart (SImode, operands[0]);
++ operands[0] = gen_lowpart (SImode, operands[0]);
++ operands[4] = gen_highpart (SImode, operands[1]);
++ operands[1] = gen_lowpart (SImode, operands[1]);
++ operands[5] = gen_highpart (SImode, operands[2]);
++ operands[2] = gen_lowpart (SImode, operands[2]);
++ }"
++ [(set_attr "conds" "set")
++ (set_attr "length" "8")
++ (set_attr "type" "multiple")]
++)
++
++(define_insn "addsi3_compareV"
++ [(set (reg:CC_V CC_REGNUM)
++ (ne:CC_V
++ (plus:DI
++ (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
++ (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
++ (sign_extend:DI (plus:SI (match_dup 1) (match_dup 2)))))
++ (set (match_operand:SI 0 "register_operand" "=r")
++ (plus:SI (match_dup 1) (match_dup 2)))]
++ "TARGET_32BIT"
++ "adds%?\\t%0, %1, %2"
++ [(set_attr "conds" "set")
++ (set_attr "type" "alus_sreg")]
++)
++
++(define_insn "*addsi3_compareV_upper"
++ [(set (reg:CC_V CC_REGNUM)
++ (ne:CC_V
++ (plus:DI
++ (plus:DI
++ (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
++ (sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
++ (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
++ (plus:DI (sign_extend:DI
++ (plus:SI (match_dup 1) (match_dup 2)))
++ (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
++ (set (match_operand:SI 0 "register_operand" "=r")
++ (plus:SI
++ (plus:SI (match_dup 1) (match_dup 2))
++ (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
++ "TARGET_32BIT"
++ "adcs%?\\t%0, %1, %2"
++ [(set_attr "conds" "set")
++ (set_attr "type" "adcs_reg")]
++)
++
++(define_insn_and_split "adddi3_compareC"
++ [(set (reg:CC_C CC_REGNUM)
++ (ne:CC_C
++ (plus:TI
++ (zero_extend:TI (match_operand:DI 1 "register_operand" "r"))
++ (zero_extend:TI (match_operand:DI 2 "register_operand" "r")))
++ (zero_extend:TI (plus:DI (match_dup 1) (match_dup 2)))))
++ (set (match_operand:DI 0 "register_operand" "=&r")
++ (plus:DI (match_dup 1) (match_dup 2)))]
++ "TARGET_32BIT"
++ "#"
++ "&& reload_completed"
++ [(parallel [(set (reg:CC_C CC_REGNUM)
++ (compare:CC_C (plus:SI (match_dup 1) (match_dup 2))
++ (match_dup 1)))
++ (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
++ (parallel [(set (reg:CC_C CC_REGNUM)
++ (ne:CC_C
++ (plus:DI (plus:DI
++ (zero_extend:DI (match_dup 4))
++ (zero_extend:DI (match_dup 5)))
++ (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
++ (plus:DI (zero_extend:DI
++ (plus:SI (match_dup 4) (match_dup 5)))
++ (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
++ (set (match_dup 3) (plus:SI
++ (plus:SI (match_dup 4) (match_dup 5))
++ (ltu:SI (reg:CC_C CC_REGNUM)
++ (const_int 0))))])]
++ "
++ {
++ operands[3] = gen_highpart (SImode, operands[0]);
++ operands[0] = gen_lowpart (SImode, operands[0]);
++ operands[4] = gen_highpart (SImode, operands[1]);
++ operands[5] = gen_highpart (SImode, operands[2]);
++ operands[1] = gen_lowpart (SImode, operands[1]);
++ operands[2] = gen_lowpart (SImode, operands[2]);
++ }"
++ [(set_attr "conds" "set")
++ (set_attr "length" "8")
++ (set_attr "type" "multiple")]
++)
++
++(define_insn "*addsi3_compareC_upper"
++ [(set (reg:CC_C CC_REGNUM)
++ (ne:CC_C
++ (plus:DI
++ (plus:DI
++ (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
++ (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
++ (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))
++ (plus:DI (zero_extend:DI
++ (plus:SI (match_dup 1) (match_dup 2)))
++ (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
++ (set (match_operand:SI 0 "register_operand" "=r")
++ (plus:SI
++ (plus:SI (match_dup 1) (match_dup 2))
++ (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
++ "TARGET_32BIT"
++ "adcs%?\\t%0, %1, %2"
++ [(set_attr "conds" "set")
++ (set_attr "type" "adcs_reg")]
++)
++
++(define_insn "addsi3_compareC"
++ [(set (reg:CC_C CC_REGNUM)
++ (ne:CC_C
++ (plus:DI
++ (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
++ (zero_extend:DI (match_operand:SI 2 "register_operand" "r")))
++ (zero_extend:DI
++ (plus:SI (match_dup 1) (match_dup 2)))))
++ (set (match_operand:SI 0 "register_operand" "=r")
++ (plus:SI (match_dup 1) (match_dup 2)))]
++ "TARGET_32BIT"
++ "adds%?\\t%0, %1, %2"
++ [(set_attr "conds" "set")
++ (set_attr "type" "alus_sreg")]
++)
++
+ (define_insn "addsi3_compare0"
+ [(set (reg:CC_NOOV CC_REGNUM)
+ (compare:CC_NOOV
+@@ -865,6 +1058,75 @@
+ (set_attr "type" "adcs_reg")]
+ )
+
++(define_expand "subv<mode>4"
++ [(match_operand:SIDI 0 "register_operand")
++ (match_operand:SIDI 1 "register_operand")
++ (match_operand:SIDI 2 "register_operand")
++ (match_operand 3 "")]
++ "TARGET_32BIT"
++{
++ emit_insn (gen_sub<mode>3_compare1 (operands[0], operands[1], operands[2]));
++ arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
++
++ DONE;
++})
++
++(define_expand "usubv<mode>4"
++ [(match_operand:SIDI 0 "register_operand")
++ (match_operand:SIDI 1 "register_operand")
++ (match_operand:SIDI 2 "register_operand")
++ (match_operand 3 "")]
++ "TARGET_32BIT"
++{
++ emit_insn (gen_sub<mode>3_compare1 (operands[0], operands[1], operands[2]));
++ arm_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
++
++ DONE;
++})
++
++(define_insn_and_split "subdi3_compare1"
++ [(set (reg:CC CC_REGNUM)
++ (compare:CC
++ (match_operand:DI 1 "register_operand" "r")
++ (match_operand:DI 2 "register_operand" "r")))
++ (set (match_operand:DI 0 "register_operand" "=&r")
++ (minus:DI (match_dup 1) (match_dup 2)))]
++ "TARGET_32BIT"
++ "#"
++ "&& reload_completed"
++ [(parallel [(set (reg:CC CC_REGNUM)
++ (compare:CC (match_dup 1) (match_dup 2)))
++ (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
++ (parallel [(set (reg:CC CC_REGNUM)
++ (compare:CC (match_dup 4) (match_dup 5)))
++ (set (match_dup 3) (minus:SI (minus:SI (match_dup 4) (match_dup 5))
++ (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
++ {
++ operands[3] = gen_highpart (SImode, operands[0]);
++ operands[0] = gen_lowpart (SImode, operands[0]);
++ operands[4] = gen_highpart (SImode, operands[1]);
++ operands[1] = gen_lowpart (SImode, operands[1]);
++ operands[5] = gen_highpart (SImode, operands[2]);
++ operands[2] = gen_lowpart (SImode, operands[2]);
++ }
++ [(set_attr "conds" "set")
++ (set_attr "length" "8")
++ (set_attr "type" "multiple")]
++)
++
++(define_insn "subsi3_compare1"
++ [(set (reg:CC CC_REGNUM)
++ (compare:CC
++ (match_operand:SI 1 "register_operand" "r")
++ (match_operand:SI 2 "register_operand" "r")))
++ (set (match_operand:SI 0 "register_operand" "=r")
++ (minus:SI (match_dup 1) (match_dup 2)))]
++ "TARGET_32BIT"
++ "subs%?\\t%0, %1, %2"
++ [(set_attr "conds" "set")
++ (set_attr "type" "alus_sreg")]
++)
++
+ (define_insn "*subsi3_carryin"
+ [(set (match_operand:SI 0 "s_register_operand" "=r,r")
+ (minus:SI (minus:SI (match_operand:SI 1 "reg_or_int_operand" "r,I")
+@@ -2136,13 +2398,13 @@
+
+ for (i = 9; i <= 31; i++)
+ {
+- if ((((HOST_WIDE_INT) 1) << i) - 1 == INTVAL (operands[2]))
++ if ((HOST_WIDE_INT_1 << i) - 1 == INTVAL (operands[2]))
+ {
+ emit_insn (gen_extzv (operands[0], operands[1], GEN_INT (i),
+ const0_rtx));
+ DONE;
+ }
+- else if ((((HOST_WIDE_INT) 1) << i) - 1
++ else if ((HOST_WIDE_INT_1 << i) - 1
+ == ~INTVAL (operands[2]))
+ {
+ rtx shift = GEN_INT (i);
+@@ -2441,7 +2703,7 @@
+ {
+ int start_bit = INTVAL (operands[2]);
+ int width = INTVAL (operands[1]);
+- HOST_WIDE_INT mask = (((HOST_WIDE_INT)1) << width) - 1;
++ HOST_WIDE_INT mask = (HOST_WIDE_INT_1 << width) - 1;
+ rtx target, subtarget;
+
+ if (arm_arch_thumb2)
+@@ -3743,8 +4005,7 @@
+ {
+ rtx scratch1, scratch2;
+
+- if (CONST_INT_P (operands[2])
+- && (HOST_WIDE_INT) INTVAL (operands[2]) == 1)
++ if (operands[2] == CONST1_RTX (SImode))
+ {
+ emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
+ DONE;
+@@ -3789,7 +4050,7 @@
+ "TARGET_EITHER"
+ "
+ if (CONST_INT_P (operands[2])
+- && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
++ && (UINTVAL (operands[2])) > 31)
+ {
+ emit_insn (gen_movsi (operands[0], const0_rtx));
+ DONE;
+@@ -3817,8 +4078,7 @@
+ {
+ rtx scratch1, scratch2;
+
+- if (CONST_INT_P (operands[2])
+- && (HOST_WIDE_INT) INTVAL (operands[2]) == 1)
++ if (operands[2] == CONST1_RTX (SImode))
+ {
+ emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
+ DONE;
+@@ -3863,7 +4123,7 @@
+ "TARGET_EITHER"
+ "
+ if (CONST_INT_P (operands[2])
+- && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
++ && UINTVAL (operands[2]) > 31)
+ operands[2] = GEN_INT (31);
+ "
+ )
+@@ -3888,8 +4148,7 @@
+ {
+ rtx scratch1, scratch2;
+
+- if (CONST_INT_P (operands[2])
+- && (HOST_WIDE_INT) INTVAL (operands[2]) == 1)
++ if (operands[2] == CONST1_RTX (SImode))
+ {
+ emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
+ DONE;
+@@ -3934,7 +4193,7 @@
+ "TARGET_EITHER"
+ "
+ if (CONST_INT_P (operands[2])
+- && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
++ && (UINTVAL (operands[2])) > 31)
+ {
+ emit_insn (gen_movsi (operands[0], const0_rtx));
+ DONE;
+@@ -3968,7 +4227,7 @@
+ if (TARGET_32BIT)
+ {
+ if (CONST_INT_P (operands[2])
+- && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
++ && UINTVAL (operands[2]) > 31)
+ operands[2] = GEN_INT (INTVAL (operands[2]) % 32);
+ }
+ else /* TARGET_THUMB1 */
+@@ -4325,23 +4584,29 @@
+
+ ;; Division instructions
+ (define_insn "divsi3"
+- [(set (match_operand:SI 0 "s_register_operand" "=r")
+- (div:SI (match_operand:SI 1 "s_register_operand" "r")
+- (match_operand:SI 2 "s_register_operand" "r")))]
++ [(set (match_operand:SI 0 "s_register_operand" "=r,r")
++ (div:SI (match_operand:SI 1 "s_register_operand" "r,r")
++ (match_operand:SI 2 "s_register_operand" "r,r")))]
+ "TARGET_IDIV"
+- "sdiv%?\t%0, %1, %2"
+- [(set_attr "predicable" "yes")
++ "@
++ sdiv%?\t%0, %1, %2
++ sdiv\t%0, %1, %2"
++ [(set_attr "arch" "32,v8mb")
++ (set_attr "predicable" "yes")
+ (set_attr "predicable_short_it" "no")
+ (set_attr "type" "sdiv")]
+ )
+
+ (define_insn "udivsi3"
+- [(set (match_operand:SI 0 "s_register_operand" "=r")
+- (udiv:SI (match_operand:SI 1 "s_register_operand" "r")
+- (match_operand:SI 2 "s_register_operand" "r")))]
++ [(set (match_operand:SI 0 "s_register_operand" "=r,r")
++ (udiv:SI (match_operand:SI 1 "s_register_operand" "r,r")
++ (match_operand:SI 2 "s_register_operand" "r,r")))]
+ "TARGET_IDIV"
+- "udiv%?\t%0, %1, %2"
+- [(set_attr "predicable" "yes")
++ "@
++ udiv%?\t%0, %1, %2
++ udiv\t%0, %1, %2"
++ [(set_attr "arch" "32,v8mb")
++ (set_attr "predicable" "yes")
+ (set_attr "predicable_short_it" "no")
+ (set_attr "type" "udiv")]
+ )
+@@ -4349,6 +4614,63 @@
+
+ ;; Unary arithmetic insns
+
++(define_expand "negvsi3"
++ [(match_operand:SI 0 "register_operand")
++ (match_operand:SI 1 "register_operand")
++ (match_operand 2 "")]
++ "TARGET_32BIT"
++{
++ emit_insn (gen_subsi3_compare (operands[0], const0_rtx, operands[1]));
++ arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
++
++ DONE;
++})
++
++(define_expand "negvdi3"
++ [(match_operand:DI 0 "register_operand")
++ (match_operand:DI 1 "register_operand")
++ (match_operand 2 "")]
++ "TARGET_ARM"
++{
++ emit_insn (gen_negdi2_compare (operands[0], operands[1]));
++ arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
++
++ DONE;
++})
++
++
++(define_insn_and_split "negdi2_compare"
++ [(set (reg:CC CC_REGNUM)
++ (compare:CC
++ (const_int 0)
++ (match_operand:DI 1 "register_operand" "0,r")))
++ (set (match_operand:DI 0 "register_operand" "=r,&r")
++ (minus:DI (const_int 0) (match_dup 1)))]
++ "TARGET_ARM"
++ "#"
++ "&& reload_completed"
++ [(parallel [(set (reg:CC CC_REGNUM)
++ (compare:CC (const_int 0) (match_dup 1)))
++ (set (match_dup 0) (minus:SI (const_int 0)
++ (match_dup 1)))])
++ (parallel [(set (reg:CC CC_REGNUM)
++ (compare:CC (const_int 0) (match_dup 3)))
++ (set (match_dup 2)
++ (minus:SI
++ (minus:SI (const_int 0) (match_dup 3))
++ (ltu:SI (reg:CC_C CC_REGNUM)
++ (const_int 0))))])]
++ {
++ operands[2] = gen_highpart (SImode, operands[0]);
++ operands[0] = gen_lowpart (SImode, operands[0]);
++ operands[3] = gen_highpart (SImode, operands[1]);
++ operands[1] = gen_lowpart (SImode, operands[1]);
++ }
++ [(set_attr "conds" "set")
++ (set_attr "length" "8")
++ (set_attr "type" "multiple")]
++)
++
+ (define_expand "negdi2"
+ [(parallel
+ [(set (match_operand:DI 0 "s_register_operand" "")
+@@ -4389,6 +4711,20 @@
+ (set_attr "type" "multiple")]
+ )
+
++(define_insn "*negsi2_carryin_compare"
++ [(set (reg:CC CC_REGNUM)
++ (compare:CC (const_int 0)
++ (match_operand:SI 1 "s_register_operand" "r")))
++ (set (match_operand:SI 0 "s_register_operand" "=r")
++ (minus:SI (minus:SI (const_int 0)
++ (match_dup 1))
++ (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))]
++ "TARGET_ARM"
++ "rscs\\t%0, %1, #0"
++ [(set_attr "conds" "set")
++ (set_attr "type" "alus_imm")]
++)
++
+ (define_expand "negsi2"
+ [(set (match_operand:SI 0 "s_register_operand" "")
+ (neg:SI (match_operand:SI 1 "s_register_operand" "")))]
+@@ -4853,7 +5189,7 @@
+ ""
+ )
+
+-/* DFmode -> HFmode conversions have to go through SFmode. */
++;; DFmode to HFmode conversions have to go through SFmode.
+ (define_expand "truncdfhf2"
+ [(set (match_operand:HF 0 "general_operand" "")
+ (float_truncate:HF
+@@ -5116,7 +5452,7 @@
+ (match_operator 5 "subreg_lowpart_operator"
+ [(match_operand:SI 4 "s_register_operand" "")]))))]
+ "TARGET_32BIT
+- && ((unsigned HOST_WIDE_INT) INTVAL (operands[3])
++ && (UINTVAL (operands[3])
+ == (GET_MODE_MASK (GET_MODE (operands[5]))
+ & (GET_MODE_MASK (GET_MODE (operands[5]))
+ << (INTVAL (operands[2])))))"
+@@ -5360,7 +5696,7 @@
+ ""
+ )
+
+-/* HFmode -> DFmode conversions have to go through SFmode. */
++;; HFmode -> DFmode conversions have to go through SFmode.
+ (define_expand "extendhfdf2"
+ [(set (match_operand:DF 0 "general_operand" "")
+ (float_extend:DF (match_operand:HF 1 "general_operand" "")))]
+@@ -5698,12 +6034,15 @@
+ ;; LO_SUM adds in the high bits. Fortunately these are opaque operations
+ ;; so this does not matter.
+ (define_insn "*arm_movt"
+- [(set (match_operand:SI 0 "nonimmediate_operand" "=r")
+- (lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
+- (match_operand:SI 2 "general_operand" "i")))]
+- "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])"
+- "movt%?\t%0, #:upper16:%c2"
+- [(set_attr "predicable" "yes")
++ [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r")
++ (lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0,0")
++ (match_operand:SI 2 "general_operand" "i,i")))]
++ "TARGET_HAVE_MOVT && arm_valid_symbolic_address_p (operands[2])"
++ "@
++ movt%?\t%0, #:upper16:%c2
++ movt\t%0, #:upper16:%c2"
++ [(set_attr "arch" "32,v8mb")
++ (set_attr "predicable" "yes")
+ (set_attr "predicable_short_it" "no")
+ (set_attr "length" "4")
+ (set_attr "type" "alu_sreg")]
+@@ -5725,6 +6064,7 @@
+ str%?\\t%1, %0"
+ [(set_attr "type" "mov_reg,mov_imm,mvn_imm,mov_imm,load1,store1")
+ (set_attr "predicable" "yes")
++ (set_attr "arch" "*,*,*,v6t2,*,*")
+ (set_attr "pool_range" "*,*,*,*,4096,*")
+ (set_attr "neg_pool_range" "*,*,*,*,4084,*")]
+ )
+@@ -5761,7 +6101,8 @@
+ [(set (match_operand:SI 0 "arm_general_register_operand" "")
+ (const:SI (plus:SI (match_operand:SI 1 "general_operand" "")
+ (match_operand:SI 2 "const_int_operand" ""))))]
+- "TARGET_THUMB2
++ "TARGET_THUMB
++ && TARGET_HAVE_MOVT
+ && arm_disable_literal_pool
+ && reload_completed
+ && GET_CODE (operands[1]) == SYMBOL_REF"
+@@ -5792,8 +6133,7 @@
+ (define_split
+ [(set (match_operand:SI 0 "arm_general_register_operand" "")
+ (match_operand:SI 1 "general_operand" ""))]
+- "TARGET_32BIT
+- && TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
++ "TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
+ && !flag_pic && !target_word_relocations
+ && !arm_tls_referenced_p (operands[1])"
+ [(clobber (const_int 0))]
+@@ -6361,7 +6701,7 @@
+ [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m,r")
+ (match_operand:HI 1 "general_operand" "rIk,K,n,r,mi"))]
+ "TARGET_ARM
+- && arm_arch4
++ && arm_arch4 && !(TARGET_HARD_FLOAT && TARGET_VFP)
+ && (register_operand (operands[0], HImode)
+ || register_operand (operands[1], HImode))"
+ "@
+@@ -6387,7 +6727,7 @@
+ (define_insn "*movhi_bytes"
+ [(set (match_operand:HI 0 "s_register_operand" "=r,r,r")
+ (match_operand:HI 1 "arm_rhs_operand" "I,rk,K"))]
+- "TARGET_ARM"
++ "TARGET_ARM && !(TARGET_HARD_FLOAT && TARGET_VFP)"
+ "@
+ mov%?\\t%0, %1\\t%@ movhi
+ mov%?\\t%0, %1\\t%@ movhi
+@@ -6395,7 +6735,7 @@
+ [(set_attr "predicable" "yes")
+ (set_attr "type" "mov_imm,mov_reg,mvn_imm")]
+ )
+-
++
+ ;; We use a DImode scratch because we may occasionally need an additional
+ ;; temporary if the address isn't offsettable -- push_reload doesn't seem
+ ;; to take any notice of the "o" constraints on reload_memory_operand operand.
+@@ -6517,7 +6857,7 @@
+ strb%?\\t%1, %0"
+ [(set_attr "type" "mov_reg,mov_reg,mov_imm,mov_imm,mvn_imm,load1,store1,load1,store1")
+ (set_attr "predicable" "yes")
+- (set_attr "predicable_short_it" "yes,yes,yes,no,no,no,no,no,no")
++ (set_attr "predicable_short_it" "yes,yes,no,yes,no,no,no,no,no")
+ (set_attr "arch" "t2,any,any,t2,any,t2,t2,any,any")
+ (set_attr "length" "2,4,4,2,4,2,2,4,4")]
+ )
+@@ -6547,7 +6887,7 @@
+ (define_insn "*arm32_movhf"
+ [(set (match_operand:HF 0 "nonimmediate_operand" "=r,m,r,r")
+ (match_operand:HF 1 "general_operand" " m,r,r,F"))]
+- "TARGET_32BIT && !(TARGET_HARD_FLOAT && TARGET_FP16)
++ "TARGET_32BIT && !(TARGET_HARD_FLOAT && TARGET_VFP)
+ && ( s_register_operand (operands[0], HFmode)
+ || s_register_operand (operands[1], HFmode))"
+ "*
+@@ -7365,6 +7705,24 @@
+ DONE;
+ }")
+
++(define_expand "cstorehf4"
++ [(set (match_operand:SI 0 "s_register_operand")
++ (match_operator:SI 1 "expandable_comparison_operator"
++ [(match_operand:HF 2 "s_register_operand")
++ (match_operand:HF 3 "arm_float_compare_operand")]))]
++ "TARGET_VFP_FP16INST"
++ {
++ if (!arm_validize_comparison (&operands[1],
++ &operands[2],
++ &operands[3]))
++ FAIL;
++
++ emit_insn (gen_cstore_cc (operands[0], operands[1],
++ operands[2], operands[3]));
++ DONE;
++ }
++)
++
+ (define_expand "cstoresf4"
+ [(set (match_operand:SI 0 "s_register_operand" "")
+ (match_operator:SI 1 "expandable_comparison_operator"
+@@ -7417,9 +7775,31 @@
+ rtx ccreg;
+
+ if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0),
+- &XEXP (operands[1], 1)))
++ &XEXP (operands[1], 1)))
+ FAIL;
+-
++
++ code = GET_CODE (operands[1]);
++ ccreg = arm_gen_compare_reg (code, XEXP (operands[1], 0),
++ XEXP (operands[1], 1), NULL_RTX);
++ operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
++ }"
++)
++
++(define_expand "movhfcc"
++ [(set (match_operand:HF 0 "s_register_operand")
++ (if_then_else:HF (match_operand 1 "arm_cond_move_operator")
++ (match_operand:HF 2 "s_register_operand")
++ (match_operand:HF 3 "s_register_operand")))]
++ "TARGET_VFP_FP16INST"
++ "
++ {
++ enum rtx_code code = GET_CODE (operands[1]);
++ rtx ccreg;
++
++ if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0),
++ &XEXP (operands[1], 1)))
++ FAIL;
++
+ code = GET_CODE (operands[1]);
+ ccreg = arm_gen_compare_reg (code, XEXP (operands[1], 0),
+ XEXP (operands[1], 1), NULL_RTX);
+@@ -7438,7 +7818,7 @@
+ enum rtx_code code = GET_CODE (operands[1]);
+ rtx ccreg;
+
+- if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0),
++ if (!arm_validize_comparison (&operands[1], &XEXP (operands[1], 0),
+ &XEXP (operands[1], 1)))
+ FAIL;
+
+@@ -7503,6 +7883,37 @@
+ (set_attr "type" "fcsel")]
+ )
+
++(define_insn "*cmovhf"
++ [(set (match_operand:HF 0 "s_register_operand" "=t")
++ (if_then_else:HF (match_operator 1 "arm_vsel_comparison_operator"
++ [(match_operand 2 "cc_register" "") (const_int 0)])
++ (match_operand:HF 3 "s_register_operand" "t")
++ (match_operand:HF 4 "s_register_operand" "t")))]
++ "TARGET_VFP_FP16INST"
++ "*
++ {
++ enum arm_cond_code code = maybe_get_arm_condition_code (operands[1]);
++ switch (code)
++ {
++ case ARM_GE:
++ case ARM_GT:
++ case ARM_EQ:
++ case ARM_VS:
++ return \"vsel%d1.f16\\t%0, %3, %4\";
++ case ARM_LT:
++ case ARM_LE:
++ case ARM_NE:
++ case ARM_VC:
++ return \"vsel%D1.f16\\t%0, %4, %3\";
++ default:
++ gcc_unreachable ();
++ }
++ return \"\";
++ }"
++ [(set_attr "conds" "use")
++ (set_attr "type" "fcsel")]
++)
++
+ (define_insn_and_split "*movsicc_insn"
+ [(set (match_operand:SI 0 "s_register_operand" "=r,r,r,r,r,r,r,r")
+ (if_then_else:SI
+@@ -8152,8 +8563,8 @@
+ )
+
+ (define_insn "probe_stack"
+- [(set (match_operand 0 "memory_operand" "=m")
+- (unspec [(const_int 0)] UNSPEC_PROBE_STACK))]
++ [(set (match_operand:SI 0 "memory_operand" "=m")
++ (unspec:SI [(const_int 0)] UNSPEC_PROBE_STACK))]
+ "TARGET_32BIT"
+ "str%?\\tr0, %0"
+ [(set_attr "type" "store1")
+@@ -10220,8 +10631,8 @@
+ (match_operand 1 "const_int_operand" "")))
+ (clobber (match_scratch:SI 2 ""))]
+ "TARGET_ARM
+- && (((unsigned HOST_WIDE_INT) INTVAL (operands[1]))
+- == (((unsigned HOST_WIDE_INT) INTVAL (operands[1])) >> 24) << 24)"
++ && ((UINTVAL (operands[1]))
++ == ((UINTVAL (operands[1])) >> 24) << 24)"
+ [(set (match_dup 2) (zero_extend:SI (match_dup 0)))
+ (set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 1)))]
+ "
+@@ -10561,7 +10972,11 @@
+ }
+ "
+ [(set_attr "type" "load4")
+- (set_attr "predicable" "yes")]
++ (set_attr "predicable" "yes")
++ (set (attr "length")
++ (symbol_ref "arm_attr_length_pop_multi (operands,
++ /*return_pc=*/false,
++ /*write_back_p=*/true)"))]
+ )
+
+ ;; Pop with return (as used in epilogue RTL)
+@@ -10590,7 +11005,10 @@
+ }
+ "
+ [(set_attr "type" "load4")
+- (set_attr "predicable" "yes")]
++ (set_attr "predicable" "yes")
++ (set (attr "length")
++ (symbol_ref "arm_attr_length_pop_multi (operands, /*return_pc=*/true,
++ /*write_back_p=*/true)"))]
+ )
+
+ (define_insn "*pop_multiple_with_return"
+@@ -10610,7 +11028,10 @@
+ }
+ "
+ [(set_attr "type" "load4")
+- (set_attr "predicable" "yes")]
++ (set_attr "predicable" "yes")
++ (set (attr "length")
++ (symbol_ref "arm_attr_length_pop_multi (operands, /*return_pc=*/true,
++ /*write_back_p=*/false)"))]
+ )
+
+ ;; Load into PC and return
+@@ -10821,19 +11242,22 @@
+ (set_attr "predicable_short_it" "no")
+ (set_attr "type" "clz")])
+
+-(define_expand "ctzsi2"
+- [(set (match_operand:SI 0 "s_register_operand" "")
+- (ctz:SI (match_operand:SI 1 "s_register_operand" "")))]
++;; Keep this as a CTZ expression until after reload and then split
++;; into RBIT + CLZ. Since RBIT is represented as an UNSPEC it is unlikely
++;; to fold with any other expression.
++
++(define_insn_and_split "ctzsi2"
++ [(set (match_operand:SI 0 "s_register_operand" "=r")
++ (ctz:SI (match_operand:SI 1 "s_register_operand" "r")))]
+ "TARGET_32BIT && arm_arch_thumb2"
++ "#"
++ "&& reload_completed"
++ [(const_int 0)]
+ "
+- {
+- rtx tmp = gen_reg_rtx (SImode);
+- emit_insn (gen_rbitsi2 (tmp, operands[1]));
+- emit_insn (gen_clzsi2 (operands[0], tmp));
+- }
+- DONE;
+- "
+-)
++ emit_insn (gen_rbitsi2 (operands[0], operands[1]));
++ emit_insn (gen_clzsi2 (operands[0], operands[0]));
++ DONE;
++")
+
+ ;; V5E instructions.
+
+@@ -10957,13 +11381,16 @@
+ ;; We only care about the lower 16 bits of the constant
+ ;; being inserted into the upper 16 bits of the register.
+ (define_insn "*arm_movtas_ze"
+- [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "+r")
++ [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "+r,r")
+ (const_int 16)
+ (const_int 16))
+ (match_operand:SI 1 "const_int_operand" ""))]
+- "arm_arch_thumb2"
+- "movt%?\t%0, %L1"
+- [(set_attr "predicable" "yes")
++ "TARGET_HAVE_MOVT"
++ "@
++ movt%?\t%0, %L1
++ movt\t%0, %L1"
++ [(set_attr "arch" "32,v8mb")
++ (set_attr "predicable" "yes")
+ (set_attr "predicable_short_it" "no")
+ (set_attr "length" "4")
+ (set_attr "type" "alu_sreg")]
+--- /dev/null
++++ b/src/gcc/config/arm/arm_fp16.h
+@@ -0,0 +1,255 @@
++/* ARM FP16 intrinsics include file.
++
++ Copyright (C) 2016 Free Software Foundation, Inc.
++ Contributed by ARM Ltd.
++
++ This file is part of GCC.
++
++ GCC is free software; you can redistribute it and/or modify it
++ under the terms of the GNU General Public License as published
++ by the Free Software Foundation; either version 3, or (at your
++ option) any later version.
++
++ GCC is distributed in the hope that it will be useful, but WITHOUT
++ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
++ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
++ License for more details.
++
++ Under Section 7 of GPL version 3, you are granted additional
++ permissions described in the GCC Runtime Library Exception, version
++ 3.1, as published by the Free Software Foundation.
++
++ You should have received a copy of the GNU General Public License and
++ a copy of the GCC Runtime Library Exception along with this program;
++ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
++ <http://www.gnu.org/licenses/>. */
++
++#ifndef _GCC_ARM_FP16_H
++#define _GCC_ARM_FP16_H 1
++
++#ifdef __cplusplus
++extern "C" {
++#endif
++
++#include <stdint.h>
++
++/* Intrinsics for FP16 instructions. */
++#pragma GCC push_options
++#pragma GCC target ("fpu=fp-armv8")
++
++#if defined (__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
++
++typedef __fp16 float16_t;
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vabsh_f16 (float16_t __a)
++{
++ return __builtin_neon_vabshf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vaddh_f16 (float16_t __a, float16_t __b)
++{
++ return __a + __b;
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvtah_s32_f16 (float16_t __a)
++{
++ return __builtin_neon_vcvtahssi (__a);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvtah_u32_f16 (float16_t __a)
++{
++ return __builtin_neon_vcvtahusi (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_f16_s32 (int32_t __a)
++{
++ return __builtin_neon_vcvthshf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_f16_u32 (uint32_t __a)
++{
++ return __builtin_neon_vcvthuhf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_n_f16_s32 (int32_t __a, const int __b)
++{
++ return __builtin_neon_vcvths_nhf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vcvth_n_f16_u32 (uint32_t __a, const int __b)
++{
++ return __builtin_neon_vcvthu_nhf ((int32_t)__a, __b);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvth_n_s32_f16 (float16_t __a, const int __b)
++{
++ return __builtin_neon_vcvths_nsi (__a, __b);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvth_n_u32_f16 (float16_t __a, const int __b)
++{
++ return (uint32_t)__builtin_neon_vcvthu_nsi (__a, __b);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvth_s32_f16 (float16_t __a)
++{
++ return __builtin_neon_vcvthssi (__a);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvth_u32_f16 (float16_t __a)
++{
++ return __builtin_neon_vcvthusi (__a);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvtmh_s32_f16 (float16_t __a)
++{
++ return __builtin_neon_vcvtmhssi (__a);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvtmh_u32_f16 (float16_t __a)
++{
++ return __builtin_neon_vcvtmhusi (__a);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvtnh_s32_f16 (float16_t __a)
++{
++ return __builtin_neon_vcvtnhssi (__a);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvtnh_u32_f16 (float16_t __a)
++{
++ return __builtin_neon_vcvtnhusi (__a);
++}
++
++__extension__ static __inline int32_t __attribute__ ((__always_inline__))
++vcvtph_s32_f16 (float16_t __a)
++{
++ return __builtin_neon_vcvtphssi (__a);
++}
++
++__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
++vcvtph_u32_f16 (float16_t __a)
++{
++ return __builtin_neon_vcvtphusi (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vdivh_f16 (float16_t __a, float16_t __b)
++{
++ return __a / __b;
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vfmah_f16 (float16_t __a, float16_t __b, float16_t __c)
++{
++ return __builtin_neon_vfmahf (__a, __b, __c);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vfmsh_f16 (float16_t __a, float16_t __b, float16_t __c)
++{
++ return __builtin_neon_vfmshf (__a, __b, __c);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vmaxnmh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_neon_vmaxnmhf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vminnmh_f16 (float16_t __a, float16_t __b)
++{
++ return __builtin_neon_vminnmhf (__a, __b);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vmulh_f16 (float16_t __a, float16_t __b)
++{
++ return __a * __b;
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vnegh_f16 (float16_t __a)
++{
++ return - __a;
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndah_f16 (float16_t __a)
++{
++ return __builtin_neon_vrndahf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndh_f16 (float16_t __a)
++{
++ return __builtin_neon_vrndhf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndih_f16 (float16_t __a)
++{
++ return __builtin_neon_vrndihf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndmh_f16 (float16_t __a)
++{
++ return __builtin_neon_vrndmhf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndnh_f16 (float16_t __a)
++{
++ return __builtin_neon_vrndnhf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndph_f16 (float16_t __a)
++{
++ return __builtin_neon_vrndphf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vrndxh_f16 (float16_t __a)
++{
++ return __builtin_neon_vrndxhf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vsqrth_f16 (float16_t __a)
++{
++ return __builtin_neon_vsqrthf (__a);
++}
++
++__extension__ static __inline float16_t __attribute__ ((__always_inline__))
++vsubh_f16 (float16_t __a, float16_t __b)
++{
++ return __a - __b;
++}
++
++#endif /* __ARM_FEATURE_FP16_SCALAR_ARITHMETIC */
++#pragma GCC pop_options
++
++#ifdef __cplusplus
++}
++#endif
++
++#endif
+--- a/src/gcc/config/arm/arm_neon.h
++++ b/src/gcc/config/arm/arm_neon.h
+@@ -38,6 +38,7 @@
+ extern "C" {
+ #endif
+
++#include <arm_fp16.h>
+ #include <stdint.h>
+
+ typedef __simd64_int8_t int8x8_t;
+@@ -530,7 +531,7 @@ vadd_s32 (int32x2_t __a, int32x2_t __b)
+ __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+ vadd_f32 (float32x2_t __a, float32x2_t __b)
+ {
+-#ifdef __FAST_MATH
++#ifdef __FAST_MATH__
+ return __a + __b;
+ #else
+ return (float32x2_t) __builtin_neon_vaddv2sf (__a, __b);
+@@ -594,7 +595,7 @@ vaddq_s64 (int64x2_t __a, int64x2_t __b)
+ __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+ vaddq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+-#ifdef __FAST_MATH
++#ifdef __FAST_MATH__
+ return __a + __b;
+ #else
+ return (float32x4_t) __builtin_neon_vaddv4sf (__a, __b);
+@@ -1030,7 +1031,7 @@ vmul_s32 (int32x2_t __a, int32x2_t __b)
+ __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+ vmul_f32 (float32x2_t __a, float32x2_t __b)
+ {
+-#ifdef __FAST_MATH
++#ifdef __FAST_MATH__
+ return __a * __b;
+ #else
+ return (float32x2_t) __builtin_neon_vmulfv2sf (__a, __b);
+@@ -1077,7 +1078,7 @@ vmulq_s32 (int32x4_t __a, int32x4_t __b)
+ __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+ vmulq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+-#ifdef __FAST_MATH
++#ifdef __FAST_MATH__
+ return __a * __b;
+ #else
+ return (float32x4_t) __builtin_neon_vmulfv4sf (__a, __b);
+@@ -1678,7 +1679,7 @@ vsub_s32 (int32x2_t __a, int32x2_t __b)
+ __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+ vsub_f32 (float32x2_t __a, float32x2_t __b)
+ {
+-#ifdef __FAST_MATH
++#ifdef __FAST_MATH__
+ return __a - __b;
+ #else
+ return (float32x2_t) __builtin_neon_vsubv2sf (__a, __b);
+@@ -1742,7 +1743,7 @@ vsubq_s64 (int64x2_t __a, int64x2_t __b)
+ __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+ vsubq_f32 (float32x4_t __a, float32x4_t __b)
+ {
+-#ifdef __FAST_MATH
++#ifdef __FAST_MATH__
+ return __a - __b;
+ #else
+ return (float32x4_t) __builtin_neon_vsubv4sf (__a, __b);
+@@ -2607,6 +2608,12 @@ vtst_p8 (poly8x8_t __a, poly8x8_t __b)
+ return (uint8x8_t)__builtin_neon_vtstv8qi ((int8x8_t) __a, (int8x8_t) __b);
+ }
+
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vtst_p16 (poly16x4_t __a, poly16x4_t __b)
++{
++ return (uint16x4_t)__builtin_neon_vtstv4hi ((int16x4_t) __a, (int16x4_t) __b);
++}
++
+ __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+ vtstq_s8 (int8x16_t __a, int8x16_t __b)
+ {
+@@ -2649,6 +2656,12 @@ vtstq_p8 (poly8x16_t __a, poly8x16_t __b)
+ return (uint8x16_t)__builtin_neon_vtstv16qi ((int8x16_t) __a, (int8x16_t) __b);
+ }
+
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vtstq_p16 (poly16x8_t __a, poly16x8_t __b)
++{
++ return (uint16x8_t)__builtin_neon_vtstv8hi ((int16x8_t) __a, (int16x8_t) __b);
++}
++
+ __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+ vabd_s8 (int8x8_t __a, int8x8_t __b)
+ {
+@@ -14830,6 +14843,855 @@ vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
+
+ #pragma GCC pop_options
+
++ /* Intrinsics for FP16 instructions. */
++#pragma GCC push_options
++#pragma GCC target ("fpu=neon-fp-armv8")
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vabd_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vabdv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vabdq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_neon_vabdv8hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vabs_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vabsv4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vabsq_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vabsv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vadd_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vaddv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vaddq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_neon_vaddv8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcage_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return (uint16x4_t)__builtin_neon_vcagev4hf (__a, __b);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcageq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return (uint16x8_t)__builtin_neon_vcagev8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcagt_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return (uint16x4_t)__builtin_neon_vcagtv4hf (__a, __b);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcagtq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return (uint16x8_t)__builtin_neon_vcagtv8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcale_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return (uint16x4_t)__builtin_neon_vcalev4hf (__a, __b);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcaleq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return (uint16x8_t)__builtin_neon_vcalev8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcalt_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return (uint16x4_t)__builtin_neon_vcaltv4hf (__a, __b);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcaltq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return (uint16x8_t)__builtin_neon_vcaltv8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vceq_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return (uint16x4_t)__builtin_neon_vceqv4hf (__a, __b);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vceqq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return (uint16x8_t)__builtin_neon_vceqv8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vceqz_f16 (float16x4_t __a)
++{
++ return (uint16x4_t)__builtin_neon_vceqzv4hf (__a);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vceqzq_f16 (float16x8_t __a)
++{
++ return (uint16x8_t)__builtin_neon_vceqzv8hf (__a);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcge_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return (uint16x4_t)__builtin_neon_vcgev4hf (__a, __b);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcgeq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return (uint16x8_t)__builtin_neon_vcgev8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcgez_f16 (float16x4_t __a)
++{
++ return (uint16x4_t)__builtin_neon_vcgezv4hf (__a);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcgezq_f16 (float16x8_t __a)
++{
++ return (uint16x8_t)__builtin_neon_vcgezv8hf (__a);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcgt_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return (uint16x4_t)__builtin_neon_vcgtv4hf (__a, __b);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcgtq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return (uint16x8_t)__builtin_neon_vcgtv8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcgtz_f16 (float16x4_t __a)
++{
++ return (uint16x4_t)__builtin_neon_vcgtzv4hf (__a);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcgtzq_f16 (float16x8_t __a)
++{
++ return (uint16x8_t)__builtin_neon_vcgtzv8hf (__a);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcle_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return (uint16x4_t)__builtin_neon_vclev4hf (__a, __b);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcleq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return (uint16x8_t)__builtin_neon_vclev8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vclez_f16 (float16x4_t __a)
++{
++ return (uint16x4_t)__builtin_neon_vclezv4hf (__a);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vclezq_f16 (float16x8_t __a)
++{
++ return (uint16x8_t)__builtin_neon_vclezv8hf (__a);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vclt_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return (uint16x4_t)__builtin_neon_vcltv4hf (__a, __b);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcltq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return (uint16x8_t)__builtin_neon_vcltv8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcltz_f16 (float16x4_t __a)
++{
++ return (uint16x4_t)__builtin_neon_vcltzv4hf (__a);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcltzq_f16 (float16x8_t __a)
++{
++ return (uint16x8_t)__builtin_neon_vcltzv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vcvt_f16_s16 (int16x4_t __a)
++{
++ return (float16x4_t)__builtin_neon_vcvtsv4hi (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vcvt_f16_u16 (uint16x4_t __a)
++{
++ return (float16x4_t)__builtin_neon_vcvtuv4hi ((int16x4_t)__a);
++}
++
++__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++vcvt_s16_f16 (float16x4_t __a)
++{
++ return (int16x4_t)__builtin_neon_vcvtsv4hf (__a);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcvt_u16_f16 (float16x4_t __a)
++{
++ return (uint16x4_t)__builtin_neon_vcvtuv4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vcvtq_f16_s16 (int16x8_t __a)
++{
++ return (float16x8_t)__builtin_neon_vcvtsv8hi (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vcvtq_f16_u16 (uint16x8_t __a)
++{
++ return (float16x8_t)__builtin_neon_vcvtuv8hi ((int16x8_t)__a);
++}
++
++__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++vcvtq_s16_f16 (float16x8_t __a)
++{
++ return (int16x8_t)__builtin_neon_vcvtsv8hf (__a);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcvtq_u16_f16 (float16x8_t __a)
++{
++ return (uint16x8_t)__builtin_neon_vcvtuv8hf (__a);
++}
++
++__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++vcvta_s16_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vcvtasv4hf (__a);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcvta_u16_f16 (float16x4_t __a)
++{
++ return (uint16x4_t)__builtin_neon_vcvtauv4hf (__a);
++}
++
++__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++vcvtaq_s16_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vcvtasv8hf (__a);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcvtaq_u16_f16 (float16x8_t __a)
++{
++ return (uint16x8_t)__builtin_neon_vcvtauv8hf (__a);
++}
++
++__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++vcvtm_s16_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vcvtmsv4hf (__a);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcvtm_u16_f16 (float16x4_t __a)
++{
++ return (uint16x4_t)__builtin_neon_vcvtmuv4hf (__a);
++}
++
++__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++vcvtmq_s16_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vcvtmsv8hf (__a);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcvtmq_u16_f16 (float16x8_t __a)
++{
++ return (uint16x8_t)__builtin_neon_vcvtmuv8hf (__a);
++}
++
++__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++vcvtn_s16_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vcvtnsv4hf (__a);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcvtn_u16_f16 (float16x4_t __a)
++{
++ return (uint16x4_t)__builtin_neon_vcvtnuv4hf (__a);
++}
++
++__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++vcvtnq_s16_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vcvtnsv8hf (__a);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcvtnq_u16_f16 (float16x8_t __a)
++{
++ return (uint16x8_t)__builtin_neon_vcvtnuv8hf (__a);
++}
++
++__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++vcvtp_s16_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vcvtpsv4hf (__a);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcvtp_u16_f16 (float16x4_t __a)
++{
++ return (uint16x4_t)__builtin_neon_vcvtpuv4hf (__a);
++}
++
++__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++vcvtpq_s16_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vcvtpsv8hf (__a);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcvtpq_u16_f16 (float16x8_t __a)
++{
++ return (uint16x8_t)__builtin_neon_vcvtpuv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vcvt_n_f16_s16 (int16x4_t __a, const int __b)
++{
++ return __builtin_neon_vcvts_nv4hi (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vcvt_n_f16_u16 (uint16x4_t __a, const int __b)
++{
++ return __builtin_neon_vcvtu_nv4hi ((int16x4_t)__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vcvtq_n_f16_s16 (int16x8_t __a, const int __b)
++{
++ return __builtin_neon_vcvts_nv8hi (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vcvtq_n_f16_u16 (uint16x8_t __a, const int __b)
++{
++ return __builtin_neon_vcvtu_nv8hi ((int16x8_t)__a, __b);
++}
++
++__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
++vcvt_n_s16_f16 (float16x4_t __a, const int __b)
++{
++ return __builtin_neon_vcvts_nv4hf (__a, __b);
++}
++
++__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
++vcvt_n_u16_f16 (float16x4_t __a, const int __b)
++{
++ return (uint16x4_t)__builtin_neon_vcvtu_nv4hf (__a, __b);
++}
++
++__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
++vcvtq_n_s16_f16 (float16x8_t __a, const int __b)
++{
++ return __builtin_neon_vcvts_nv8hf (__a, __b);
++}
++
++__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
++vcvtq_n_u16_f16 (float16x8_t __a, const int __b)
++{
++ return (uint16x8_t)__builtin_neon_vcvtu_nv8hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
++{
++ return __builtin_neon_vfmav4hf (__a, __b, __c);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
++{
++ return __builtin_neon_vfmav8hf (__a, __b, __c);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
++{
++ return __builtin_neon_vfmsv4hf (__a, __b, __c);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
++{
++ return __builtin_neon_vfmsv8hf (__a, __b, __c);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vmax_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vmaxfv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vmaxq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_neon_vmaxfv8hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vmaxnm_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vmaxnmv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vmaxnmq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_neon_vmaxnmv8hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vmin_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vminfv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vminq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_neon_vminfv8hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vminnm_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vminnmv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vminnmq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_neon_vminnmv8hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vmul_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vmulfv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vmul_lane_f16 (float16x4_t __a, float16x4_t __b, const int __c)
++{
++ return __builtin_neon_vmul_lanev4hf (__a, __b, __c);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vmul_n_f16 (float16x4_t __a, float16_t __b)
++{
++ return __builtin_neon_vmul_nv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vmulq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_neon_vmulfv8hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vmulq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __c)
++{
++ return __builtin_neon_vmul_lanev8hf (__a, __b, __c);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vmulq_n_f16 (float16x8_t __a, float16_t __b)
++{
++ return __builtin_neon_vmul_nv8hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vneg_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vnegv4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vnegq_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vnegv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vpadd_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vpaddv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vpmax_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vpmaxfv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vpmin_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vpminfv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrecpe_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vrecpev4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrecpeq_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vrecpev8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrnd_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vrndv4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrndq_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vrndv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrnda_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vrndav4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrndaq_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vrndav8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrndm_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vrndmv4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrndmq_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vrndmv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrndn_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vrndnv4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrndnq_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vrndnv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrndp_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vrndpv4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrndpq_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vrndpv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrndx_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vrndxv4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrndxq_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vrndxv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrsqrte_f16 (float16x4_t __a)
++{
++ return __builtin_neon_vrsqrtev4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrsqrteq_f16 (float16x8_t __a)
++{
++ return __builtin_neon_vrsqrtev8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrecps_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vrecpsv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrecpsq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_neon_vrecpsv8hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrsqrts_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vrsqrtsv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrsqrtsq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_neon_vrsqrtsv8hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vsub_f16 (float16x4_t __a, float16x4_t __b)
++{
++ return __builtin_neon_vsubv4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vsubq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ return __builtin_neon_vsubv8hf (__a, __b);
++}
++
++#endif /* __ARM_FEATURE_VECTOR_FP16_ARITHMETIC. */
++#pragma GCC pop_options
++
++ /* Half-precision data processing intrinsics. */
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vbsl_f16 (uint16x4_t __a, float16x4_t __b, float16x4_t __c)
++{
++ return __builtin_neon_vbslv4hf ((int16x4_t)__a, __b, __c);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vbslq_f16 (uint16x8_t __a, float16x8_t __b, float16x8_t __c)
++{
++ return __builtin_neon_vbslv8hf ((int16x8_t)__a, __b, __c);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vdup_n_f16 (float16_t __a)
++{
++ return __builtin_neon_vdup_nv4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vdupq_n_f16 (float16_t __a)
++{
++ return __builtin_neon_vdup_nv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vdup_lane_f16 (float16x4_t __a, const int __b)
++{
++ return __builtin_neon_vdup_lanev4hf (__a, __b);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vdupq_lane_f16 (float16x4_t __a, const int __b)
++{
++ return __builtin_neon_vdup_lanev8hf (__a, __b);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vext_f16 (float16x4_t __a, float16x4_t __b, const int __c)
++{
++ return __builtin_neon_vextv4hf (__a, __b, __c);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vextq_f16 (float16x8_t __a, float16x8_t __b, const int __c)
++{
++ return __builtin_neon_vextv8hf (__a, __b, __c);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vmov_n_f16 (float16_t __a)
++{
++ return __builtin_neon_vdup_nv4hf (__a);
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vmovq_n_f16 (float16_t __a)
++{
++ return __builtin_neon_vdup_nv8hf (__a);
++}
++
++__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
++vrev64_f16 (float16x4_t __a)
++{
++ return (float16x4_t)__builtin_shuffle (__a, (uint16x4_t){ 3, 2, 1, 0 });
++}
++
++__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
++vrev64q_f16 (float16x8_t __a)
++{
++ return
++ (float16x8_t)__builtin_shuffle (__a,
++ (uint16x8_t){ 3, 2, 1, 0, 7, 6, 5, 4 });
++}
++
++__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
++vtrn_f16 (float16x4_t __a, float16x4_t __b)
++{
++ float16x4x2_t __rv;
++#ifdef __ARM_BIG_ENDIAN
++ __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 5, 1, 7, 3 });
++ __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 4, 0, 6, 2 });
++#else
++ __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 0, 4, 2, 6 });
++ __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 1, 5, 3, 7 });
++#endif
++ return __rv;
++}
++
++__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
++vtrnq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ float16x8x2_t __rv;
++#ifdef __ARM_BIG_ENDIAN
++ __rv.val[0] = __builtin_shuffle (__a, __b,
++ (uint16x8_t){ 9, 1, 11, 3, 13, 5, 15, 7 });
++ __rv.val[1] = __builtin_shuffle (__a, __b,
++ (uint16x8_t){ 8, 0, 10, 2, 12, 4, 14, 6 });
++#else
++ __rv.val[0] = __builtin_shuffle (__a, __b,
++ (uint16x8_t){ 0, 8, 2, 10, 4, 12, 6, 14 });
++ __rv.val[1] = __builtin_shuffle (__a, __b,
++ (uint16x8_t){ 1, 9, 3, 11, 5, 13, 7, 15 });
++#endif
++ return __rv;
++}
++
++__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
++vuzp_f16 (float16x4_t __a, float16x4_t __b)
++{
++ float16x4x2_t __rv;
++#ifdef __ARM_BIG_ENDIAN
++ __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 5, 7, 1, 3 });
++ __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 4, 6, 0, 2 });
++#else
++ __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 0, 2, 4, 6 });
++ __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 1, 3, 5, 7 });
++#endif
++ return __rv;
++}
++
++__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
++vuzpq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ float16x8x2_t __rv;
++#ifdef __ARM_BIG_ENDIAN
++ __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
++ { 5, 7, 1, 3, 13, 15, 9, 11 });
++ __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t)
++ { 4, 6, 0, 2, 12, 14, 8, 10 });
++#else
++ __rv.val[0] = __builtin_shuffle (__a, __b,
++ (uint16x8_t){ 0, 2, 4, 6, 8, 10, 12, 14 });
++ __rv.val[1] = __builtin_shuffle (__a, __b,
++ (uint16x8_t){ 1, 3, 5, 7, 9, 11, 13, 15 });
++#endif
++ return __rv;
++}
++
++__extension__ static __inline float16x4x2_t __attribute__ ((__always_inline__))
++vzip_f16 (float16x4_t __a, float16x4_t __b)
++{
++ float16x4x2_t __rv;
++#ifdef __ARM_BIG_ENDIAN
++ __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 6, 2, 7, 3 });
++ __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 4, 0, 5, 1 });
++#else
++ __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x4_t){ 0, 4, 1, 5 });
++ __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x4_t){ 2, 6, 3, 7 });
++#endif
++ return __rv;
++}
++
++__extension__ static __inline float16x8x2_t __attribute__ ((__always_inline__))
++vzipq_f16 (float16x8_t __a, float16x8_t __b)
++{
++ float16x8x2_t __rv;
++#ifdef __ARM_BIG_ENDIAN
++ __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
++ { 10, 2, 11, 3, 8, 0, 9, 1 });
++ __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t)
++ { 14, 6, 15, 7, 12, 4, 13, 5 });
++#else
++ __rv.val[0] = __builtin_shuffle (__a, __b,
++ (uint16x8_t){ 0, 8, 1, 9, 2, 10, 3, 11 });
++ __rv.val[1] = __builtin_shuffle (__a, __b,
++ (uint16x8_t){ 4, 12, 5, 13, 6, 14, 7, 15 });
++#endif
++ return __rv;
++}
++
++#endif
++
+ #ifdef __cplusplus
+ }
+ #endif
+--- a/src/gcc/config/arm/arm_neon_builtins.def
++++ b/src/gcc/config/arm/arm_neon_builtins.def
+@@ -19,6 +19,7 @@
+ <http://www.gnu.org/licenses/>. */
+
+ VAR2 (BINOP, vadd, v2sf, v4sf)
++VAR2 (BINOP, vadd, v8hf, v4hf)
+ VAR3 (BINOP, vaddls, v8qi, v4hi, v2si)
+ VAR3 (BINOP, vaddlu, v8qi, v4hi, v2si)
+ VAR3 (BINOP, vaddws, v8qi, v4hi, v2si)
+@@ -32,12 +33,15 @@ VAR8 (BINOP, vqaddu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+ VAR3 (BINOP, vaddhn, v8hi, v4si, v2di)
+ VAR3 (BINOP, vraddhn, v8hi, v4si, v2di)
+ VAR2 (BINOP, vmulf, v2sf, v4sf)
++VAR2 (BINOP, vmulf, v8hf, v4hf)
+ VAR2 (BINOP, vmulp, v8qi, v16qi)
+ VAR8 (TERNOP, vmla, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+ VAR3 (TERNOP, vmlals, v8qi, v4hi, v2si)
+ VAR3 (TERNOP, vmlalu, v8qi, v4hi, v2si)
+ VAR2 (TERNOP, vfma, v2sf, v4sf)
++VAR2 (TERNOP, vfma, v4hf, v8hf)
+ VAR2 (TERNOP, vfms, v2sf, v4sf)
++VAR2 (TERNOP, vfms, v4hf, v8hf)
+ VAR8 (TERNOP, vmls, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+ VAR3 (TERNOP, vmlsls, v8qi, v4hi, v2si)
+ VAR3 (TERNOP, vmlslu, v8qi, v4hi, v2si)
+@@ -94,6 +98,7 @@ VAR8 (TERNOP_IMM, vsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+ VAR8 (TERNOP_IMM, vrsras_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+ VAR8 (TERNOP_IMM, vrsrau_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+ VAR2 (BINOP, vsub, v2sf, v4sf)
++VAR2 (BINOP, vsub, v8hf, v4hf)
+ VAR3 (BINOP, vsubls, v8qi, v4hi, v2si)
+ VAR3 (BINOP, vsublu, v8qi, v4hi, v2si)
+ VAR3 (BINOP, vsubws, v8qi, v4hi, v2si)
+@@ -111,12 +116,27 @@ VAR8 (BINOP, vcgt, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+ VAR6 (BINOP, vcgtu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR2 (BINOP, vcage, v2sf, v4sf)
+ VAR2 (BINOP, vcagt, v2sf, v4sf)
++VAR2 (BINOP, vcage, v4hf, v8hf)
++VAR2 (BINOP, vcagt, v4hf, v8hf)
++VAR2 (BINOP, vcale, v4hf, v8hf)
++VAR2 (BINOP, vcalt, v4hf, v8hf)
++VAR2 (BINOP, vceq, v4hf, v8hf)
++VAR2 (BINOP, vcge, v4hf, v8hf)
++VAR2 (BINOP, vcgt, v4hf, v8hf)
++VAR2 (BINOP, vcle, v4hf, v8hf)
++VAR2 (BINOP, vclt, v4hf, v8hf)
++VAR2 (UNOP, vceqz, v4hf, v8hf)
++VAR2 (UNOP, vcgez, v4hf, v8hf)
++VAR2 (UNOP, vcgtz, v4hf, v8hf)
++VAR2 (UNOP, vclez, v4hf, v8hf)
++VAR2 (UNOP, vcltz, v4hf, v8hf)
+ VAR6 (BINOP, vtst, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR6 (BINOP, vabds, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR6 (BINOP, vabdu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR2 (BINOP, vabdf, v2sf, v4sf)
+ VAR3 (BINOP, vabdls, v8qi, v4hi, v2si)
+ VAR3 (BINOP, vabdlu, v8qi, v4hi, v2si)
++VAR2 (BINOP, vabd, v8hf, v4hf)
+
+ VAR6 (TERNOP, vabas, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR6 (TERNOP, vabau, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+@@ -126,27 +146,38 @@ VAR3 (TERNOP, vabalu, v8qi, v4hi, v2si)
+ VAR6 (BINOP, vmaxs, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR6 (BINOP, vmaxu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR2 (BINOP, vmaxf, v2sf, v4sf)
++VAR2 (BINOP, vmaxf, v8hf, v4hf)
++VAR2 (BINOP, vmaxnm, v4hf, v8hf)
+ VAR6 (BINOP, vmins, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR6 (BINOP, vminu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR2 (BINOP, vminf, v2sf, v4sf)
++VAR2 (BINOP, vminf, v4hf, v8hf)
++VAR2 (BINOP, vminnm, v8hf, v4hf)
+
+ VAR3 (BINOP, vpmaxs, v8qi, v4hi, v2si)
+ VAR3 (BINOP, vpmaxu, v8qi, v4hi, v2si)
+ VAR1 (BINOP, vpmaxf, v2sf)
++VAR1 (BINOP, vpmaxf, v4hf)
+ VAR3 (BINOP, vpmins, v8qi, v4hi, v2si)
+ VAR3 (BINOP, vpminu, v8qi, v4hi, v2si)
+ VAR1 (BINOP, vpminf, v2sf)
++VAR1 (BINOP, vpminf, v4hf)
+
+ VAR4 (BINOP, vpadd, v8qi, v4hi, v2si, v2sf)
++VAR1 (BINOP, vpadd, v4hf)
+ VAR6 (UNOP, vpaddls, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR6 (UNOP, vpaddlu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR6 (BINOP, vpadals, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR6 (BINOP, vpadalu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR2 (BINOP, vrecps, v2sf, v4sf)
+ VAR2 (BINOP, vrsqrts, v2sf, v4sf)
++VAR2 (BINOP, vrecps, v4hf, v8hf)
++VAR2 (BINOP, vrsqrts, v4hf, v8hf)
+ VAR8 (TERNOP_IMM, vsri_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+ VAR8 (TERNOP_IMM, vsli_n, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
+ VAR8 (UNOP, vabs, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
++VAR2 (UNOP, vabs, v8hf, v4hf)
++VAR2 (UNOP, vneg, v8hf, v4hf)
+ VAR6 (UNOP, vqabs, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR8 (UNOP, vneg, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+ VAR6 (UNOP, vqneg, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+@@ -155,8 +186,16 @@ VAR6 (UNOP, vclz, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
+ VAR5 (BSWAP, bswap, v4hi, v8hi, v2si, v4si, v2di)
+ VAR2 (UNOP, vcnt, v8qi, v16qi)
+ VAR4 (UNOP, vrecpe, v2si, v2sf, v4si, v4sf)
++VAR2 (UNOP, vrecpe, v8hf, v4hf)
+ VAR4 (UNOP, vrsqrte, v2si, v2sf, v4si, v4sf)
++VAR2 (UNOP, vrsqrte, v4hf, v8hf)
+ VAR6 (UNOP, vmvn, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
++VAR2 (UNOP, vrnd, v8hf, v4hf)
++VAR2 (UNOP, vrnda, v8hf, v4hf)
++VAR2 (UNOP, vrndm, v8hf, v4hf)
++VAR2 (UNOP, vrndn, v8hf, v4hf)
++VAR2 (UNOP, vrndp, v8hf, v4hf)
++VAR2 (UNOP, vrndx, v8hf, v4hf)
+ /* FIXME: vget_lane supports more variants than this! */
+ VAR10 (GETLANE, vget_lane,
+ v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
+@@ -166,8 +205,10 @@ VAR10 (SETLANE, vset_lane,
+ VAR5 (UNOP, vcreate, v8qi, v4hi, v2si, v2sf, di)
+ VAR10 (UNOP, vdup_n,
+ v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
++VAR2 (UNOP, vdup_n, v8hf, v4hf)
+ VAR10 (GETLANE, vdup_lane,
+ v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
++VAR2 (GETLANE, vdup_lane, v8hf, v4hf)
+ VAR6 (COMBINE, vcombine, v8qi, v4hi, v4hf, v2si, v2sf, di)
+ VAR6 (UNOP, vget_high, v16qi, v8hi, v8hf, v4si, v4sf, v2di)
+ VAR6 (UNOP, vget_low, v16qi, v8hi, v8hf, v4si, v4sf, v2di)
+@@ -177,7 +218,7 @@ VAR3 (UNOP, vqmovnu, v8hi, v4si, v2di)
+ VAR3 (UNOP, vqmovun, v8hi, v4si, v2di)
+ VAR3 (UNOP, vmovls, v8qi, v4hi, v2si)
+ VAR3 (UNOP, vmovlu, v8qi, v4hi, v2si)
+-VAR6 (SETLANE, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
++VAR8 (SETLANE, vmul_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf, v4hf, v8hf)
+ VAR6 (MAC_LANE, vmla_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+ VAR2 (MAC_LANE, vmlals_lane, v4hi, v2si)
+ VAR2 (MAC_LANE, vmlalu_lane, v4hi, v2si)
+@@ -186,7 +227,7 @@ VAR6 (MAC_LANE, vmls_lane, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+ VAR2 (MAC_LANE, vmlsls_lane, v4hi, v2si)
+ VAR2 (MAC_LANE, vmlslu_lane, v4hi, v2si)
+ VAR2 (MAC_LANE, vqdmlsl_lane, v4hi, v2si)
+-VAR6 (BINOP, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
++VAR8 (BINOP, vmul_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf, v4hf, v8hf)
+ VAR6 (MAC_N, vmla_n, v4hi, v2si, v2sf, v8hi, v4si, v4sf)
+ VAR2 (MAC_N, vmlals_n, v4hi, v2si)
+ VAR2 (MAC_N, vmlalu_n, v4hi, v2si)
+@@ -197,17 +238,27 @@ VAR2 (MAC_N, vmlslu_n, v4hi, v2si)
+ VAR2 (MAC_N, vqdmlsl_n, v4hi, v2si)
+ VAR10 (SETLANE, vext,
+ v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
++VAR2 (SETLANE, vext, v8hf, v4hf)
+ VAR8 (UNOP, vrev64, v8qi, v4hi, v2si, v2sf, v16qi, v8hi, v4si, v4sf)
+ VAR4 (UNOP, vrev32, v8qi, v4hi, v16qi, v8hi)
+ VAR2 (UNOP, vrev16, v8qi, v16qi)
+ VAR4 (UNOP, vcvts, v2si, v2sf, v4si, v4sf)
++VAR2 (UNOP, vcvts, v4hi, v8hi)
++VAR2 (UNOP, vcvts, v4hf, v8hf)
++VAR2 (UNOP, vcvtu, v4hi, v8hi)
++VAR2 (UNOP, vcvtu, v4hf, v8hf)
+ VAR4 (UNOP, vcvtu, v2si, v2sf, v4si, v4sf)
+ VAR4 (BINOP, vcvts_n, v2si, v2sf, v4si, v4sf)
+ VAR4 (BINOP, vcvtu_n, v2si, v2sf, v4si, v4sf)
++VAR2 (BINOP, vcvts_n, v4hf, v8hf)
++VAR2 (BINOP, vcvtu_n, v4hi, v8hi)
++VAR2 (BINOP, vcvts_n, v4hi, v8hi)
++VAR2 (BINOP, vcvtu_n, v4hf, v8hf)
+ VAR1 (UNOP, vcvtv4sf, v4hf)
+ VAR1 (UNOP, vcvtv4hf, v4sf)
+ VAR10 (TERNOP, vbsl,
+ v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di)
++VAR2 (TERNOP, vbsl, v8hf, v4hf)
+ VAR2 (UNOP, copysignf, v2sf, v4sf)
+ VAR2 (UNOP, vrintn, v2sf, v4sf)
+ VAR2 (UNOP, vrinta, v2sf, v4sf)
+@@ -219,6 +270,14 @@ VAR1 (UNOP, vcvtav2sf, v2si)
+ VAR1 (UNOP, vcvtav4sf, v4si)
+ VAR1 (UNOP, vcvtauv2sf, v2si)
+ VAR1 (UNOP, vcvtauv4sf, v4si)
++VAR2 (UNOP, vcvtas, v4hf, v8hf)
++VAR2 (UNOP, vcvtau, v4hf, v8hf)
++VAR2 (UNOP, vcvtms, v4hf, v8hf)
++VAR2 (UNOP, vcvtmu, v4hf, v8hf)
++VAR2 (UNOP, vcvtns, v4hf, v8hf)
++VAR2 (UNOP, vcvtnu, v4hf, v8hf)
++VAR2 (UNOP, vcvtps, v4hf, v8hf)
++VAR2 (UNOP, vcvtpu, v4hf, v8hf)
+ VAR1 (UNOP, vcvtpv2sf, v2si)
+ VAR1 (UNOP, vcvtpv4sf, v4si)
+ VAR1 (UNOP, vcvtpuv2sf, v2si)
+--- /dev/null
++++ b/src/gcc/config/arm/arm_vfp_builtins.def
+@@ -0,0 +1,51 @@
++/* VFP instruction builtin definitions.
++ Copyright (C) 2016 Free Software Foundation, Inc.
++ Contributed by ARM Ltd.
++ This file is part of GCC.
++
++ GCC is free software; you can redistribute it and/or modify it
++ under the terms of the GNU General Public License as published
++ by the Free Software Foundation; either version 3, or (at your
++ option) any later version.
++
++ GCC is distributed in the hope that it will be useful, but WITHOUT
++ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
++ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
++ License for more details.
++
++ You should have received a copy of the GNU General Public License
++ along with GCC; see the file COPYING3. If not see
++ <http://www.gnu.org/licenses/>. */
++
++/* This file lists the builtins that may be available when VFP is enabled but
++ not NEON is enabled. The entries otherwise have the same requirements and
++ generate the same structures as those in the arm_neon_builtins.def. */
++
++/* FP16 Arithmetic instructions. */
++VAR1 (UNOP, vabs, hf)
++VAR2 (UNOP, vcvths, hf, si)
++VAR2 (UNOP, vcvthu, hf, si)
++VAR1 (UNOP, vcvtahs, si)
++VAR1 (UNOP, vcvtahu, si)
++VAR1 (UNOP, vcvtmhs, si)
++VAR1 (UNOP, vcvtmhu, si)
++VAR1 (UNOP, vcvtnhs, si)
++VAR1 (UNOP, vcvtnhu, si)
++VAR1 (UNOP, vcvtphs, si)
++VAR1 (UNOP, vcvtphu, si)
++VAR1 (UNOP, vrnd, hf)
++VAR1 (UNOP, vrnda, hf)
++VAR1 (UNOP, vrndi, hf)
++VAR1 (UNOP, vrndm, hf)
++VAR1 (UNOP, vrndn, hf)
++VAR1 (UNOP, vrndp, hf)
++VAR1 (UNOP, vrndx, hf)
++VAR1 (UNOP, vsqrt, hf)
++
++VAR2 (BINOP, vcvths_n, hf, si)
++VAR2 (BINOP, vcvthu_n, hf, si)
++VAR1 (BINOP, vmaxnm, hf)
++VAR1 (BINOP, vminnm, hf)
++
++VAR1 (TERNOP, vfma, hf)
++VAR1 (TERNOP, vfms, hf)
+--- a/src/gcc/config/arm/bpabi.h
++++ b/src/gcc/config/arm/bpabi.h
+@@ -75,6 +75,9 @@
+ |mcpu=cortex-a57.cortex-a53 \
+ |mcpu=cortex-a72 \
+ |mcpu=cortex-a72.cortex-a53 \
++ |mcpu=cortex-a73 \
++ |mcpu=cortex-a73.cortex-a35 \
++ |mcpu=cortex-a73.cortex-a53 \
+ |mcpu=exynos-m1 \
+ |mcpu=qdf24xx \
+ |mcpu=xgene1 \
+@@ -90,6 +93,11 @@
+ |march=armv8-a+crc \
+ |march=armv8.1-a \
+ |march=armv8.1-a+crc \
++ |march=armv8.2-a \
++ |march=armv8.2-a+fp16 \
++ |march=armv8-m.base \
++ |march=armv8-m.main \
++ |march=armv8-m.main+dsp \
+ :%{!r:--be8}}}"
+ #else
+ #define BE8_LINK_SPEC \
+@@ -105,6 +113,9 @@
+ |mcpu=cortex-a57.cortex-a53 \
+ |mcpu=cortex-a72 \
+ |mcpu=cortex-a72.cortex-a53 \
++ |mcpu=cortex-a73 \
++ |mcpu=cortex-a73.cortex-a35 \
++ |mcpu=cortex-a73.cortex-a53 \
+ |mcpu=exynos-m1 \
+ |mcpu=qdf24xx \
+ |mcpu=xgene1 \
+@@ -121,6 +132,11 @@
+ |march=armv8-a+crc \
+ |march=armv8.1-a \
+ |march=armv8.1-a+crc \
++ |march=armv8.2-a \
++ |march=armv8.2-a+fp16 \
++ |march=armv8-m.base \
++ |march=armv8-m.main \
++ |march=armv8-m.main+dsp \
+ :%{!r:--be8}}}"
+ #endif
+
+--- a/src/gcc/config/arm/constraints.md
++++ b/src/gcc/config/arm/constraints.md
+@@ -66,7 +66,7 @@
+
+ (define_constraint "j"
+ "A constant suitable for a MOVW instruction. (ARM/Thumb-2)"
+- (and (match_test "TARGET_32BIT && arm_arch_thumb2")
++ (and (match_test "TARGET_HAVE_MOVT")
+ (ior (and (match_code "high")
+ (match_test "arm_valid_symbolic_address_p (XEXP (op, 0))"))
+ (and (match_code "const_int")
+--- a/src/gcc/config/arm/cortex-a53.md
++++ b/src/gcc/config/arm/cortex-a53.md
+@@ -30,6 +30,7 @@
+
+ (define_cpu_unit "cortex_a53_slot0" "cortex_a53")
+ (define_cpu_unit "cortex_a53_slot1" "cortex_a53")
++(final_presence_set "cortex_a53_slot1" "cortex_a53_slot0")
+
+ (define_reservation "cortex_a53_slot_any"
+ "cortex_a53_slot0\
+@@ -71,41 +72,43 @@
+
+ (define_insn_reservation "cortex_a53_shift" 2
+ (and (eq_attr "tune" "cortexa53")
+- (eq_attr "type" "adr,shift_imm,shift_reg,mov_imm,mvn_imm"))
++ (eq_attr "type" "adr,shift_imm,mov_imm,mvn_imm,mov_shift"))
+ "cortex_a53_slot_any")
+
+-(define_insn_reservation "cortex_a53_alu_rotate_imm" 2
++(define_insn_reservation "cortex_a53_shift_reg" 2
+ (and (eq_attr "tune" "cortexa53")
+- (eq_attr "type" "rotate_imm"))
+- "(cortex_a53_slot1)
+- | (cortex_a53_single_issue)")
++ (eq_attr "type" "shift_reg,mov_shift_reg"))
++ "cortex_a53_slot_any+cortex_a53_hazard")
+
+ (define_insn_reservation "cortex_a53_alu" 3
+ (and (eq_attr "tune" "cortexa53")
+ (eq_attr "type" "alu_imm,alus_imm,logic_imm,logics_imm,
+ alu_sreg,alus_sreg,logic_reg,logics_reg,
+ adc_imm,adcs_imm,adc_reg,adcs_reg,
+- bfm,csel,clz,rbit,rev,alu_dsp_reg,
+- mov_reg,mvn_reg,
+- mrs,multiple,no_insn"))
++ csel,clz,rbit,rev,alu_dsp_reg,
++ mov_reg,mvn_reg,mrs,multiple,no_insn"))
+ "cortex_a53_slot_any")
+
+ (define_insn_reservation "cortex_a53_alu_shift" 3
+ (and (eq_attr "tune" "cortexa53")
+ (eq_attr "type" "alu_shift_imm,alus_shift_imm,
+ crc,logic_shift_imm,logics_shift_imm,
+- alu_ext,alus_ext,
+- extend,mov_shift,mvn_shift"))
++ alu_ext,alus_ext,bfm,extend,mvn_shift"))
+ "cortex_a53_slot_any")
+
+ (define_insn_reservation "cortex_a53_alu_shift_reg" 3
+ (and (eq_attr "tune" "cortexa53")
+ (eq_attr "type" "alu_shift_reg,alus_shift_reg,
+ logic_shift_reg,logics_shift_reg,
+- mov_shift_reg,mvn_shift_reg"))
++ mvn_shift_reg"))
+ "cortex_a53_slot_any+cortex_a53_hazard")
+
+-(define_insn_reservation "cortex_a53_mul" 3
++(define_insn_reservation "cortex_a53_alu_extr" 3
++ (and (eq_attr "tune" "cortexa53")
++ (eq_attr "type" "rotate_imm"))
++ "cortex_a53_slot1|cortex_a53_single_issue")
++
++(define_insn_reservation "cortex_a53_mul" 4
+ (and (eq_attr "tune" "cortexa53")
+ (ior (eq_attr "mul32" "yes")
+ (eq_attr "mul64" "yes")))
+@@ -189,49 +192,43 @@
+ (define_insn_reservation "cortex_a53_branch" 0
+ (and (eq_attr "tune" "cortexa53")
+ (eq_attr "type" "branch,call"))
+- "cortex_a53_slot_any,cortex_a53_branch")
++ "cortex_a53_slot_any+cortex_a53_branch")
+
+ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+ ;; General-purpose register bypasses
+ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+-;; Model bypasses for unshifted operands to ALU instructions.
++;; Model bypasses for ALU to ALU instructions.
+
+-(define_bypass 1 "cortex_a53_shift"
+- "cortex_a53_shift")
++(define_bypass 0 "cortex_a53_shift*"
++ "cortex_a53_alu")
+
+-(define_bypass 1 "cortex_a53_alu,
+- cortex_a53_alu_shift*,
+- cortex_a53_alu_rotate_imm,
+- cortex_a53_shift"
++(define_bypass 1 "cortex_a53_shift*"
++ "cortex_a53_shift*,cortex_a53_alu_*")
++
++(define_bypass 1 "cortex_a53_alu*"
+ "cortex_a53_alu")
+
+-(define_bypass 2 "cortex_a53_alu,
+- cortex_a53_alu_shift*"
++(define_bypass 1 "cortex_a53_alu*"
+ "cortex_a53_alu_shift*"
+ "aarch_forward_to_shift_is_not_shifted_reg")
+
+-;; In our model, we allow any general-purpose register operation to
+-;; bypass to the accumulator operand of an integer MADD-like operation.
++(define_bypass 2 "cortex_a53_alu*"
++ "cortex_a53_alu_*,cortex_a53_shift*")
+
+-(define_bypass 1 "cortex_a53_alu*,
+- cortex_a53_load*,
+- cortex_a53_mul"
++;; Model a bypass from MUL/MLA to MLA instructions.
++
++(define_bypass 1 "cortex_a53_mul"
+ "cortex_a53_mul"
+ "aarch_accumulator_forwarding")
+
+-;; Model a bypass from MLA/MUL to many ALU instructions.
++;; Model a bypass from MUL/MLA to ALU instructions.
+
+ (define_bypass 2 "cortex_a53_mul"
+- "cortex_a53_alu,
+- cortex_a53_alu_shift*")
+-
+-;; We get neater schedules by allowing an MLA/MUL to feed an
+-;; early load address dependency to a load.
++ "cortex_a53_alu")
+
+-(define_bypass 2 "cortex_a53_mul"
+- "cortex_a53_load*"
+- "arm_early_load_addr_dep")
++(define_bypass 3 "cortex_a53_mul"
++ "cortex_a53_alu_*,cortex_a53_shift*")
+
+ ;; Model bypasses for loads which are to be consumed by the ALU.
+
+@@ -239,47 +236,37 @@
+ "cortex_a53_alu")
+
+ (define_bypass 3 "cortex_a53_load1"
+- "cortex_a53_alu_shift*")
++ "cortex_a53_alu_*,cortex_a53_shift*")
++
++(define_bypass 3 "cortex_a53_load2"
++ "cortex_a53_alu")
+
+ ;; Model a bypass for ALU instructions feeding stores.
+
+-(define_bypass 1 "cortex_a53_alu*"
+- "cortex_a53_store1,
+- cortex_a53_store2,
+- cortex_a53_store3plus"
++(define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*"
++ "cortex_a53_store*"
+ "arm_no_early_store_addr_dep")
+
+ ;; Model a bypass for load and multiply instructions feeding stores.
+
+-(define_bypass 2 "cortex_a53_mul,
+- cortex_a53_load1,
+- cortex_a53_load2,
+- cortex_a53_load3plus"
+- "cortex_a53_store1,
+- cortex_a53_store2,
+- cortex_a53_store3plus"
++(define_bypass 1 "cortex_a53_mul,
++ cortex_a53_load*"
++ "cortex_a53_store*"
+ "arm_no_early_store_addr_dep")
+
+ ;; Model a GP->FP register move as similar to stores.
+
+-(define_bypass 1 "cortex_a53_alu*"
++(define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*"
+ "cortex_a53_r2f")
+
+-(define_bypass 2 "cortex_a53_mul,
+- cortex_a53_load1,
+- cortex_a53_load2,
+- cortex_a53_load3plus"
++(define_bypass 1 "cortex_a53_mul,
++ cortex_a53_load*"
+ "cortex_a53_r2f")
+
+-;; Shifts feeding Load/Store addresses may not be ready in time.
++;; Model flag forwarding to branches.
+
+-(define_bypass 3 "cortex_a53_shift"
+- "cortex_a53_load*"
+- "arm_early_load_addr_dep")
+-
+-(define_bypass 3 "cortex_a53_shift"
+- "cortex_a53_store*"
+- "arm_early_store_addr_dep")
++(define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*"
++ "cortex_a53_branch")
+
+ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+ ;; Floating-point/Advanced SIMD.
+--- a/src/gcc/config/arm/cortex-a57.md
++++ b/src/gcc/config/arm/cortex-a57.md
+@@ -297,7 +297,7 @@
+ (eq_attr "type" "alu_imm,alus_imm,logic_imm,logics_imm,\
+ alu_sreg,alus_sreg,logic_reg,logics_reg,\
+ adc_imm,adcs_imm,adc_reg,adcs_reg,\
+- adr,bfm,clz,rbit,rev,alu_dsp_reg,\
++ adr,bfm,clz,csel,rbit,rev,alu_dsp_reg,\
+ rotate_imm,shift_imm,shift_reg,\
+ mov_imm,mov_reg,\
+ mvn_imm,mvn_reg,\
+@@ -726,7 +726,7 @@
+
+ (define_insn_reservation "cortex_a57_fp_cpys" 4
+ (and (eq_attr "tune" "cortexa57")
+- (eq_attr "type" "fmov"))
++ (eq_attr "type" "fmov,fcsel"))
+ "(ca57_cx1|ca57_cx2)")
+
+ (define_insn_reservation "cortex_a57_fp_divs" 12
+--- a/src/gcc/config/arm/cortex-a8-neon.md
++++ b/src/gcc/config/arm/cortex-a8-neon.md
+@@ -357,30 +357,34 @@
+ (eq_attr "type" "fmuls"))
+ "cortex_a8_vfp,cortex_a8_vfplite*11")
+
++;; Don't model a reservation for more than 15 cycles as this explodes the
++;; state space of the automaton for little gain. It is unlikely that the
++;; scheduler will find enough instructions to hide the full latency of the
++;; instructions.
+ (define_insn_reservation "cortex_a8_vfp_muld" 17
+ (and (eq_attr "tune" "cortexa8")
+ (eq_attr "type" "fmuld"))
+- "cortex_a8_vfp,cortex_a8_vfplite*16")
++ "cortex_a8_vfp,cortex_a8_vfplite*15")
+
+ (define_insn_reservation "cortex_a8_vfp_macs" 21
+ (and (eq_attr "tune" "cortexa8")
+ (eq_attr "type" "fmacs,ffmas"))
+- "cortex_a8_vfp,cortex_a8_vfplite*20")
++ "cortex_a8_vfp,cortex_a8_vfplite*15")
+
+ (define_insn_reservation "cortex_a8_vfp_macd" 26
+ (and (eq_attr "tune" "cortexa8")
+ (eq_attr "type" "fmacd,ffmad"))
+- "cortex_a8_vfp,cortex_a8_vfplite*25")
++ "cortex_a8_vfp,cortex_a8_vfplite*15")
+
+ (define_insn_reservation "cortex_a8_vfp_divs" 37
+ (and (eq_attr "tune" "cortexa8")
+ (eq_attr "type" "fdivs, fsqrts"))
+- "cortex_a8_vfp,cortex_a8_vfplite*36")
++ "cortex_a8_vfp,cortex_a8_vfplite*15")
+
+ (define_insn_reservation "cortex_a8_vfp_divd" 65
+ (and (eq_attr "tune" "cortexa8")
+ (eq_attr "type" "fdivd, fsqrtd"))
+- "cortex_a8_vfp,cortex_a8_vfplite*64")
++ "cortex_a8_vfp,cortex_a8_vfplite*15")
+
+ ;; Comparisons can actually take 7 cycles sometimes instead of four,
+ ;; but given all the other instructions lumped into type=ffarith that
+--- a/src/gcc/config/arm/crypto.md
++++ b/src/gcc/config/arm/crypto.md
+@@ -18,14 +18,27 @@
+ ;; along with GCC; see the file COPYING3. If not see
+ ;; <http://www.gnu.org/licenses/>.
+
++
++;; When AES/AESMC fusion is enabled we want the register allocation to
++;; look like:
++;; AESE Vn, _
++;; AESMC Vn, Vn
++;; So prefer to tie operand 1 to operand 0 when fusing.
++
+ (define_insn "crypto_<crypto_pattern>"
+- [(set (match_operand:<crypto_mode> 0 "register_operand" "=w")
++ [(set (match_operand:<crypto_mode> 0 "register_operand" "=w,w")
+ (unspec:<crypto_mode> [(match_operand:<crypto_mode> 1
+- "register_operand" "w")]
++ "register_operand" "0,w")]
+ CRYPTO_UNARY))]
+ "TARGET_CRYPTO"
+ "<crypto_pattern>.<crypto_size_sfx>\\t%q0, %q1"
+- [(set_attr "type" "<crypto_type>")]
++ [(set_attr "type" "<crypto_type>")
++ (set_attr_alternative "enabled"
++ [(if_then_else (match_test
++ "arm_fusion_enabled_p (tune_params::FUSE_AES_AESMC)")
++ (const_string "yes" )
++ (const_string "no"))
++ (const_string "yes")])]
+ )
+
+ (define_insn "crypto_<crypto_pattern>"
+--- a/src/gcc/config/arm/driver-arm.c
++++ b/src/gcc/config/arm/driver-arm.c
+@@ -46,6 +46,12 @@ static struct vendor_cpu arm_cpu_table[] = {
+ {"0xc0d", "armv7ve", "cortex-a12"},
+ {"0xc0e", "armv7ve", "cortex-a17"},
+ {"0xc0f", "armv7ve", "cortex-a15"},
++ {"0xd01", "armv8-a+crc", "cortex-a32"},
++ {"0xd04", "armv8-a+crc", "cortex-a35"},
++ {"0xd03", "armv8-a+crc", "cortex-a53"},
++ {"0xd07", "armv8-a+crc", "cortex-a57"},
++ {"0xd08", "armv8-a+crc", "cortex-a72"},
++ {"0xd09", "armv8-a+crc", "cortex-a73"},
+ {"0xc14", "armv7-r", "cortex-r4"},
+ {"0xc15", "armv7-r", "cortex-r5"},
+ {"0xc20", "armv6-m", "cortex-m0"},
+--- a/src/gcc/config/arm/elf.h
++++ b/src/gcc/config/arm/elf.h
+@@ -148,8 +148,9 @@
+ while (0)
+
+ /* Horrible hack: We want to prevent some libgcc routines being included
+- for some multilibs. */
+-#ifndef __ARM_ARCH_6M__
++ for some multilibs. The condition should match the one in
++ libgcc/config/arm/lib1funcs.S. */
++#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1
+ #undef L_fixdfsi
+ #undef L_fixunsdfsi
+ #undef L_truncdfsf2
+--- a/src/gcc/config/arm/iterators.md
++++ b/src/gcc/config/arm/iterators.md
+@@ -119,6 +119,10 @@
+ ;; All supported vector modes (except those with 64-bit integer elements).
+ (define_mode_iterator VDQW [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF])
+
++;; All supported vector modes including 16-bit float modes.
++(define_mode_iterator VDQWH [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF
++ V8HF V4HF])
++
+ ;; Supported integer vector modes (not 64 bit elements).
+ (define_mode_iterator VDQIW [V8QI V16QI V4HI V8HI V2SI V4SI])
+
+@@ -141,6 +145,9 @@
+ ;; Vector modes form int->float conversions.
+ (define_mode_iterator VCVTI [V2SI V4SI])
+
++;; Vector modes for int->half conversions.
++(define_mode_iterator VCVTHI [V4HI V8HI])
++
+ ;; Vector modes for doubleword multiply-accumulate, etc. insns.
+ (define_mode_iterator VMD [V4HI V2SI V2SF])
+
+@@ -174,6 +181,9 @@
+ ;; Modes with 8-bit, 16-bit and 32-bit elements.
+ (define_mode_iterator VU [V16QI V8HI V4SI])
+
++;; Vector modes for 16-bit floating-point support.
++(define_mode_iterator VH [V8HF V4HF])
++
+ ;; Iterators used for fixed-point support.
+ (define_mode_iterator FIXED [QQ HQ SQ UQQ UHQ USQ HA SA UHA USA])
+
+@@ -192,14 +202,17 @@
+ ;; Code iterators
+ ;;----------------------------------------------------------------------------
+
+-;; A list of condition codes used in compare instructions where
+-;; the carry flag from the addition is used instead of doing the
++;; A list of condition codes used in compare instructions where
++;; the carry flag from the addition is used instead of doing the
+ ;; compare a second time.
+ (define_code_iterator LTUGEU [ltu geu])
+
+ ;; The signed gt, ge comparisons
+ (define_code_iterator GTGE [gt ge])
+
++;; The signed gt, ge, lt, le comparisons
++(define_code_iterator GLTE [gt ge lt le])
++
+ ;; The unsigned gt, ge comparisons
+ (define_code_iterator GTUGEU [gtu geu])
+
+@@ -228,6 +241,12 @@
+ ;; Binary operators whose second operand can be shifted.
+ (define_code_iterator SHIFTABLE_OPS [plus minus ior xor and])
+
++;; Operations on the sign of a number.
++(define_code_iterator ABSNEG [abs neg])
++
++;; Conversions.
++(define_code_iterator FCVT [unsigned_float float])
++
+ ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
+ ;; a stack pointer opoerand. The minus operation is a candidate for an rsub
+ ;; and hence only plus is supported.
+@@ -251,10 +270,14 @@
+ (define_int_iterator VRINT [UNSPEC_VRINTZ UNSPEC_VRINTP UNSPEC_VRINTM
+ UNSPEC_VRINTR UNSPEC_VRINTX UNSPEC_VRINTA])
+
+-(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE UNSPEC_VCLT UNSPEC_VCLE])
++(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE
++ UNSPEC_VCLT UNSPEC_VCLE])
+
+ (define_int_iterator NEON_VACMP [UNSPEC_VCAGE UNSPEC_VCAGT])
+
++(define_int_iterator NEON_VAGLTE [UNSPEC_VCAGE UNSPEC_VCAGT
++ UNSPEC_VCALE UNSPEC_VCALT])
++
+ (define_int_iterator VCVT [UNSPEC_VRINTP UNSPEC_VRINTM UNSPEC_VRINTA])
+
+ (define_int_iterator NEON_VRINT [UNSPEC_NVRINTP UNSPEC_NVRINTZ UNSPEC_NVRINTM
+@@ -323,6 +346,22 @@
+
+ (define_int_iterator VCVT_US_N [UNSPEC_VCVT_S_N UNSPEC_VCVT_U_N])
+
++(define_int_iterator VCVT_HF_US_N [UNSPEC_VCVT_HF_S_N UNSPEC_VCVT_HF_U_N])
++
++(define_int_iterator VCVT_SI_US_N [UNSPEC_VCVT_SI_S_N UNSPEC_VCVT_SI_U_N])
++
++(define_int_iterator VCVT_HF_US [UNSPEC_VCVTA_S UNSPEC_VCVTA_U
++ UNSPEC_VCVTM_S UNSPEC_VCVTM_U
++ UNSPEC_VCVTN_S UNSPEC_VCVTN_U
++ UNSPEC_VCVTP_S UNSPEC_VCVTP_U])
++
++(define_int_iterator VCVTH_US [UNSPEC_VCVTH_S UNSPEC_VCVTH_U])
++
++;; Operators for FP16 instructions.
++(define_int_iterator FP16_RND [UNSPEC_VRND UNSPEC_VRNDA
++ UNSPEC_VRNDM UNSPEC_VRNDN
++ UNSPEC_VRNDP UNSPEC_VRNDX])
++
+ (define_int_iterator VQMOVN [UNSPEC_VQMOVN_S UNSPEC_VQMOVN_U])
+
+ (define_int_iterator VMOVL [UNSPEC_VMOVL_S UNSPEC_VMOVL_U])
+@@ -366,6 +405,8 @@
+
+ (define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
+
++(define_int_iterator VFM_LANE_AS [UNSPEC_VFMA_LANE UNSPEC_VFMS_LANE])
++
+ ;;----------------------------------------------------------------------------
+ ;; Mode attributes
+ ;;----------------------------------------------------------------------------
+@@ -384,6 +425,10 @@
+ (define_mode_attr V_cvtto [(V2SI "v2sf") (V2SF "v2si")
+ (V4SI "v4sf") (V4SF "v4si")])
+
++;; (Opposite) mode to convert to/from for vector-half mode conversions.
++(define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
++ (V8HI "V8HF") (V8HF "V8HI")])
++
+ ;; Define element mode for each vector mode.
+ (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
+ (V4HI "HI") (V8HI "HI")
+@@ -427,12 +472,13 @@
+
+ ;; Register width from element mode
+ (define_mode_attr V_reg [(V8QI "P") (V16QI "q")
+- (V4HI "P") (V8HI "q")
+- (V4HF "P") (V8HF "q")
+- (V2SI "P") (V4SI "q")
+- (V2SF "P") (V4SF "q")
+- (DI "P") (V2DI "q")
+- (SF "") (DF "P")])
++ (V4HI "P") (V8HI "q")
++ (V4HF "P") (V8HF "q")
++ (V2SI "P") (V4SI "q")
++ (V2SF "P") (V4SF "q")
++ (DI "P") (V2DI "q")
++ (SF "") (DF "P")
++ (HF "")])
+
+ ;; Wider modes with the same number of elements.
+ (define_mode_attr V_widen [(V8QI "V8HI") (V4HI "V4SI") (V2SI "V2DI")])
+@@ -448,7 +494,7 @@
+ (define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI")
+ (V8HF "V4HF") (V4SI "V2SI")
+ (V4SF "V2SF") (V2DF "DF")
+- (V2DI "DI")])
++ (V2DI "DI") (V4HF "HF")])
+
+ ;; Same, but lower-case.
+ (define_mode_attr V_half [(V16QI "v8qi") (V8HI "v4hi")
+@@ -475,9 +521,10 @@
+ ;; Used for neon_vdup_lane, where the second operand is double-sized
+ ;; even when the first one is quad.
+ (define_mode_attr V_double_vector_mode [(V16QI "V8QI") (V8HI "V4HI")
+- (V4SI "V2SI") (V4SF "V2SF")
+- (V8QI "V8QI") (V4HI "V4HI")
+- (V2SI "V2SI") (V2SF "V2SF")])
++ (V4SI "V2SI") (V4SF "V2SF")
++ (V8QI "V8QI") (V4HI "V4HI")
++ (V2SI "V2SI") (V2SF "V2SF")
++ (V8HF "V4HF") (V4HF "V4HF")])
+
+ ;; Mode of result of comparison operations (and bit-select operand 1).
+ (define_mode_attr V_cmp_result [(V8QI "V8QI") (V16QI "V16QI")
+@@ -496,18 +543,22 @@
+ ;; Get element type from double-width mode, for operations where we
+ ;; don't care about signedness.
+ (define_mode_attr V_if_elem [(V8QI "i8") (V16QI "i8")
+- (V4HI "i16") (V8HI "i16")
+- (V2SI "i32") (V4SI "i32")
+- (DI "i64") (V2DI "i64")
+- (V2SF "f32") (V4SF "f32")
+- (SF "f32") (DF "f64")])
++ (V4HI "i16") (V8HI "i16")
++ (V2SI "i32") (V4SI "i32")
++ (DI "i64") (V2DI "i64")
++ (V2SF "f32") (V4SF "f32")
++ (SF "f32") (DF "f64")
++ (HF "f16") (V4HF "f16")
++ (V8HF "f16")])
+
+ ;; Same, but for operations which work on signed values.
+ (define_mode_attr V_s_elem [(V8QI "s8") (V16QI "s8")
+- (V4HI "s16") (V8HI "s16")
+- (V2SI "s32") (V4SI "s32")
+- (DI "s64") (V2DI "s64")
+- (V2SF "f32") (V4SF "f32")])
++ (V4HI "s16") (V8HI "s16")
++ (V2SI "s32") (V4SI "s32")
++ (DI "s64") (V2DI "s64")
++ (V2SF "f32") (V4SF "f32")
++ (HF "f16") (V4HF "f16")
++ (V8HF "f16")])
+
+ ;; Same, but for operations which work on unsigned values.
+ (define_mode_attr V_u_elem [(V8QI "u8") (V16QI "u8")
+@@ -524,17 +575,22 @@
+ (V2SF "32") (V4SF "32")])
+
+ (define_mode_attr V_sz_elem [(V8QI "8") (V16QI "8")
+- (V4HI "16") (V8HI "16")
+- (V2SI "32") (V4SI "32")
+- (DI "64") (V2DI "64")
++ (V4HI "16") (V8HI "16")
++ (V2SI "32") (V4SI "32")
++ (DI "64") (V2DI "64")
+ (V4HF "16") (V8HF "16")
+- (V2SF "32") (V4SF "32")])
++ (V2SF "32") (V4SF "32")])
+
+ (define_mode_attr V_elem_ch [(V8QI "b") (V16QI "b")
+- (V4HI "h") (V8HI "h")
+- (V2SI "s") (V4SI "s")
+- (DI "d") (V2DI "d")
+- (V2SF "s") (V4SF "s")])
++ (V4HI "h") (V8HI "h")
++ (V2SI "s") (V4SI "s")
++ (DI "d") (V2DI "d")
++ (V2SF "s") (V4SF "s")
++ (V2SF "s") (V4SF "s")])
++
++(define_mode_attr VH_elem_ch [(V4HI "s") (V8HI "s")
++ (V4HF "s") (V8HF "s")
++ (HF "s")])
+
+ ;; Element sizes for duplicating ARM registers to all elements of a vector.
+ (define_mode_attr VD_dup [(V8QI "8") (V4HI "16") (V2SI "32") (V2SF "32")])
+@@ -570,29 +626,30 @@
+ ;; This mode attribute is used to obtain the correct register constraints.
+
+ (define_mode_attr scalar_mul_constraint [(V4HI "x") (V2SI "t") (V2SF "t")
+- (V8HI "x") (V4SI "t") (V4SF "t")])
++ (V8HI "x") (V4SI "t") (V4SF "t")
++ (V8HF "x") (V4HF "x")])
+
+ ;; Predicates used for setting type for neon instructions
+
+ (define_mode_attr Is_float_mode [(V8QI "false") (V16QI "false")
+- (V4HI "false") (V8HI "false")
+- (V2SI "false") (V4SI "false")
+- (V4HF "true") (V8HF "true")
+- (V2SF "true") (V4SF "true")
+- (DI "false") (V2DI "false")])
++ (V4HI "false") (V8HI "false")
++ (V2SI "false") (V4SI "false")
++ (V4HF "true") (V8HF "true")
++ (V2SF "true") (V4SF "true")
++ (DI "false") (V2DI "false")])
+
+ (define_mode_attr Scalar_mul_8_16 [(V8QI "true") (V16QI "true")
+- (V4HI "true") (V8HI "true")
+- (V2SI "false") (V4SI "false")
+- (V2SF "false") (V4SF "false")
+- (DI "false") (V2DI "false")])
+-
++ (V4HI "true") (V8HI "true")
++ (V2SI "false") (V4SI "false")
++ (V2SF "false") (V4SF "false")
++ (DI "false") (V2DI "false")])
+
+ (define_mode_attr Is_d_reg [(V8QI "true") (V16QI "false")
+- (V4HI "true") (V8HI "false")
+- (V2SI "true") (V4SI "false")
+- (V2SF "true") (V4SF "false")
+- (DI "true") (V2DI "false")])
++ (V4HI "true") (V8HI "false")
++ (V2SI "true") (V4SI "false")
++ (V2SF "true") (V4SF "false")
++ (DI "true") (V2DI "false")
++ (V4HF "true") (V8HF "false")])
+
+ (define_mode_attr V_mode_nunits [(V8QI "8") (V16QI "16")
+ (V4HF "4") (V8HF "8")
+@@ -637,12 +694,14 @@
+
+ ;; Mode attribute used to build the "type" attribute.
+ (define_mode_attr q [(V8QI "") (V16QI "_q")
+- (V4HI "") (V8HI "_q")
+- (V2SI "") (V4SI "_q")
++ (V4HI "") (V8HI "_q")
++ (V2SI "") (V4SI "_q")
++ (V4HF "") (V8HF "_q")
++ (V2SF "") (V4SF "_q")
+ (V4HF "") (V8HF "_q")
+- (V2SF "") (V4SF "_q")
+- (DI "") (V2DI "_q")
+- (DF "") (V2DF "_q")])
++ (DI "") (V2DI "_q")
++ (DF "") (V2DF "_q")
++ (HF "")])
+
+ (define_mode_attr pf [(V8QI "p") (V16QI "p") (V2SF "f") (V4SF "f")])
+
+@@ -679,6 +738,16 @@
+ (define_code_attr shift [(ashiftrt "ashr") (lshiftrt "lshr")])
+ (define_code_attr shifttype [(ashiftrt "signed") (lshiftrt "unsigned")])
+
++;; String reprentations of operations on the sign of a number.
++(define_code_attr absneg_str [(abs "abs") (neg "neg")])
++
++;; Conversions.
++(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
++
++(define_code_attr float_sup [(unsigned_float "u") (float "s")])
++
++(define_code_attr float_SUP [(unsigned_float "U") (float "S")])
++
+ ;;----------------------------------------------------------------------------
+ ;; Int attributes
+ ;;----------------------------------------------------------------------------
+@@ -710,7 +779,13 @@
+ (UNSPEC_VPMAX "s") (UNSPEC_VPMAX_U "u")
+ (UNSPEC_VPMIN "s") (UNSPEC_VPMIN_U "u")
+ (UNSPEC_VCVT_S "s") (UNSPEC_VCVT_U "u")
++ (UNSPEC_VCVTA_S "s") (UNSPEC_VCVTA_U "u")
++ (UNSPEC_VCVTM_S "s") (UNSPEC_VCVTM_U "u")
++ (UNSPEC_VCVTN_S "s") (UNSPEC_VCVTN_U "u")
++ (UNSPEC_VCVTP_S "s") (UNSPEC_VCVTP_U "u")
+ (UNSPEC_VCVT_S_N "s") (UNSPEC_VCVT_U_N "u")
++ (UNSPEC_VCVT_HF_S_N "s") (UNSPEC_VCVT_HF_U_N "u")
++ (UNSPEC_VCVT_SI_S_N "s") (UNSPEC_VCVT_SI_U_N "u")
+ (UNSPEC_VQMOVN_S "s") (UNSPEC_VQMOVN_U "u")
+ (UNSPEC_VMOVL_S "s") (UNSPEC_VMOVL_U "u")
+ (UNSPEC_VSHL_S "s") (UNSPEC_VSHL_U "u")
+@@ -725,13 +800,30 @@
+ (UNSPEC_VSHLL_S_N "s") (UNSPEC_VSHLL_U_N "u")
+ (UNSPEC_VSRA_S_N "s") (UNSPEC_VSRA_U_N "u")
+ (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
+-
++ (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
+ ])
+
++(define_int_attr vcvth_op
++ [(UNSPEC_VCVTA_S "a") (UNSPEC_VCVTA_U "a")
++ (UNSPEC_VCVTM_S "m") (UNSPEC_VCVTM_U "m")
++ (UNSPEC_VCVTN_S "n") (UNSPEC_VCVTN_U "n")
++ (UNSPEC_VCVTP_S "p") (UNSPEC_VCVTP_U "p")])
++
++(define_int_attr fp16_rnd_str
++ [(UNSPEC_VRND "rnd") (UNSPEC_VRNDA "rnda")
++ (UNSPEC_VRNDM "rndm") (UNSPEC_VRNDN "rndn")
++ (UNSPEC_VRNDP "rndp") (UNSPEC_VRNDX "rndx")])
++
++(define_int_attr fp16_rnd_insn
++ [(UNSPEC_VRND "vrintz") (UNSPEC_VRNDA "vrinta")
++ (UNSPEC_VRNDM "vrintm") (UNSPEC_VRNDN "vrintn")
++ (UNSPEC_VRNDP "vrintp") (UNSPEC_VRNDX "vrintx")])
++
+ (define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt")
+- (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
+- (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
+- (UNSPEC_VCAGT "gt")])
++ (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le")
++ (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge")
++ (UNSPEC_VCAGT "gt") (UNSPEC_VCALE "le")
++ (UNSPEC_VCALT "lt")])
+
+ (define_int_attr r [
+ (UNSPEC_VRHADD_S "r") (UNSPEC_VRHADD_U "r")
+@@ -847,3 +939,7 @@
+
+ ;; Attributes for VQRDMLAH/VQRDMLSH
+ (define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
++
++;; Attributes for VFMA_LANE/ VFMS_LANE
++(define_int_attr neon_vfm_lane_as
++ [(UNSPEC_VFMA_LANE "a") (UNSPEC_VFMS_LANE "s")])
+--- a/src/gcc/config/arm/neon-testgen.ml
++++ b/src//dev/null
+@@ -1,324 +0,0 @@
+-(* Auto-generate ARM Neon intrinsics tests.
+- Copyright (C) 2006-2016 Free Software Foundation, Inc.
+- Contributed by CodeSourcery.
+-
+- This file is part of GCC.
+-
+- GCC is free software; you can redistribute it and/or modify it under
+- the terms of the GNU General Public License as published by the Free
+- Software Foundation; either version 3, or (at your option) any later
+- version.
+-
+- GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+- WARRANTY; without even the implied warranty of MERCHANTABILITY or
+- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+- for more details.
+-
+- You should have received a copy of the GNU General Public License
+- along with GCC; see the file COPYING3. If not see
+- <http://www.gnu.org/licenses/>.
+-
+- This is an O'Caml program. The O'Caml compiler is available from:
+-
+- http://caml.inria.fr/
+-
+- Or from your favourite OS's friendly packaging system. Tested with version
+- 3.09.2, though other versions will probably work too.
+-
+- Compile with:
+- ocamlc -c neon.ml
+- ocamlc -o neon-testgen neon.cmo neon-testgen.ml
+-
+- Run with:
+- cd /path/to/gcc/testsuite/gcc.target/arm/neon
+- /path/to/neon-testgen
+-*)
+-
+-open Neon
+-
+-type c_type_flags = Pointer | Const
+-
+-(* Open a test source file. *)
+-let open_test_file dir name =
+- try
+- open_out (dir ^ "/" ^ name ^ ".c")
+- with Sys_error str ->
+- failwith ("Could not create test source file " ^ name ^ ": " ^ str)
+-
+-(* Emit prologue code to a test source file. *)
+-let emit_prologue chan test_name effective_target compile_test_optim =
+- Printf.fprintf chan "/* Test the `%s' ARM Neon intrinsic. */\n" test_name;
+- Printf.fprintf chan "/* This file was autogenerated by neon-testgen. */\n\n";
+- Printf.fprintf chan "/* { dg-do assemble } */\n";
+- Printf.fprintf chan "/* { dg-require-effective-target %s_ok } */\n"
+- effective_target;
+- Printf.fprintf chan "/* { dg-options \"-save-temps %s\" } */\n" compile_test_optim;
+- Printf.fprintf chan "/* { dg-add-options %s } */\n" effective_target;
+- Printf.fprintf chan "\n#include \"arm_neon.h\"\n\n"
+-
+-(* Emit declarations of variables that are going to be passed
+- to an intrinsic, together with one to take a returned value if needed. *)
+-let emit_variables chan c_types features spaces =
+- let emit () =
+- ignore (
+- List.fold_left (fun arg_number -> fun (flags, ty) ->
+- let pointer_bit =
+- if List.mem Pointer flags then "*" else ""
+- in
+- (* Const arguments to builtins are directly
+- written in as constants. *)
+- if not (List.mem Const flags) then
+- Printf.fprintf chan "%s%s %sarg%d_%s;\n"
+- spaces ty pointer_bit arg_number ty;
+- arg_number + 1)
+- 0 (List.tl c_types))
+- in
+- match c_types with
+- (_, return_ty) :: tys ->
+- if return_ty <> "void" then begin
+- (* The intrinsic returns a value. We need to do explicit register
+- allocation for vget_low tests or they fail because of copy
+- elimination. *)
+- ((if List.mem Fixed_vector_reg features then
+- Printf.fprintf chan "%sregister %s out_%s asm (\"d18\");\n"
+- spaces return_ty return_ty
+- else if List.mem Fixed_core_reg features then
+- Printf.fprintf chan "%sregister %s out_%s asm (\"r0\");\n"
+- spaces return_ty return_ty
+- else
+- Printf.fprintf chan "%s%s out_%s;\n" spaces return_ty return_ty);
+- emit ())
+- end else
+- (* The intrinsic does not return a value. *)
+- emit ()
+- | _ -> assert false
+-
+-(* Emit code to call an intrinsic. *)
+-let emit_call chan const_valuator c_types name elt_ty =
+- (if snd (List.hd c_types) <> "void" then
+- Printf.fprintf chan " out_%s = " (snd (List.hd c_types))
+- else
+- Printf.fprintf chan " ");
+- Printf.fprintf chan "%s_%s (" (intrinsic_name name) (string_of_elt elt_ty);
+- let print_arg chan arg_number (flags, ty) =
+- (* If the argument is of const type, then directly write in the
+- constant now. *)
+- if List.mem Const flags then
+- match const_valuator with
+- None ->
+- if List.mem Pointer flags then
+- Printf.fprintf chan "0"
+- else
+- Printf.fprintf chan "1"
+- | Some f -> Printf.fprintf chan "%s" (string_of_int (f arg_number))
+- else
+- Printf.fprintf chan "arg%d_%s" arg_number ty
+- in
+- let rec print_args arg_number tys =
+- match tys with
+- [] -> ()
+- | [ty] -> print_arg chan arg_number ty
+- | ty::tys ->
+- print_arg chan arg_number ty;
+- Printf.fprintf chan ", ";
+- print_args (arg_number + 1) tys
+- in
+- print_args 0 (List.tl c_types);
+- Printf.fprintf chan ");\n"
+-
+-(* Emit epilogue code to a test source file. *)
+-let emit_epilogue chan features regexps =
+- let no_op = List.exists (fun feature -> feature = No_op) features in
+- Printf.fprintf chan "}\n\n";
+- if not no_op then
+- List.iter (fun regexp ->
+- Printf.fprintf chan
+- "/* { dg-final { scan-assembler \"%s\" } } */\n" regexp)
+- regexps
+- else
+- ()
+-
+-
+-(* Check a list of C types to determine which ones are pointers and which
+- ones are const. *)
+-let check_types tys =
+- let tys' =
+- List.map (fun ty ->
+- let len = String.length ty in
+- if len > 2 && String.get ty (len - 2) = ' '
+- && String.get ty (len - 1) = '*'
+- then ([Pointer], String.sub ty 0 (len - 2))
+- else ([], ty)) tys
+- in
+- List.map (fun (flags, ty) ->
+- if String.length ty > 6 && String.sub ty 0 6 = "const "
+- then (Const :: flags, String.sub ty 6 ((String.length ty) - 6))
+- else (flags, ty)) tys'
+-
+-(* Work out what the effective target should be. *)
+-let effective_target features =
+- try
+- match List.find (fun feature ->
+- match feature with Requires_feature _ -> true
+- | Requires_arch _ -> true
+- | Requires_FP_bit 1 -> true
+- | _ -> false)
+- features with
+- Requires_feature "FMA" -> "arm_neonv2"
+- | Requires_feature "CRYPTO" -> "arm_crypto"
+- | Requires_arch 8 -> "arm_v8_neon"
+- | Requires_FP_bit 1 -> "arm_neon_fp16"
+- | _ -> assert false
+- with Not_found -> "arm_neon"
+-
+-(* Work out what the testcase optimization level should be, default to -O0. *)
+-let compile_test_optim features =
+- try
+- match List.find (fun feature ->
+- match feature with Compiler_optim _ -> true
+- | _ -> false)
+- features with
+- Compiler_optim opt -> opt
+- | _ -> assert false
+- with Not_found -> "-O0"
+-
+-(* Given an intrinsic shape, produce a regexp that will match
+- the right-hand sides of instructions generated by an intrinsic of
+- that shape. *)
+-let rec analyze_shape shape =
+- let rec n_things n thing =
+- match n with
+- 0 -> []
+- | n -> thing :: (n_things (n - 1) thing)
+- in
+- let rec analyze_shape_elt elt =
+- match elt with
+- Dreg -> "\\[dD\\]\\[0-9\\]+"
+- | Qreg -> "\\[qQ\\]\\[0-9\\]+"
+- | Corereg -> "\\[rR\\]\\[0-9\\]+"
+- | Immed -> "#\\[0-9\\]+"
+- | VecArray (1, elt) ->
+- let elt_regexp = analyze_shape_elt elt in
+- "((\\\\\\{" ^ elt_regexp ^ "\\\\\\})|(" ^ elt_regexp ^ "))"
+- | VecArray (n, elt) ->
+- let elt_regexp = analyze_shape_elt elt in
+- let alt1 = elt_regexp ^ "-" ^ elt_regexp in
+- let alt2 = commas (fun x -> x) (n_things n elt_regexp) "" in
+- "\\\\\\{((" ^ alt1 ^ ")|(" ^ alt2 ^ "))\\\\\\}"
+- | (PtrTo elt | CstPtrTo elt) ->
+- "\\\\\\[" ^ (analyze_shape_elt elt) ^ "\\(:\\[0-9\\]+\\)?\\\\\\]"
+- | Element_of_dreg -> (analyze_shape_elt Dreg) ^ "\\\\\\[\\[0-9\\]+\\\\\\]"
+- | Element_of_qreg -> (analyze_shape_elt Qreg) ^ "\\\\\\[\\[0-9\\]+\\\\\\]"
+- | All_elements_of_dreg -> (analyze_shape_elt Dreg) ^ "\\\\\\[\\\\\\]"
+- | Alternatives (elts) -> "(" ^ (String.concat "|" (List.map analyze_shape_elt elts)) ^ ")"
+- in
+- match shape with
+- All (n, elt) -> commas analyze_shape_elt (n_things n elt) ""
+- | Long -> (analyze_shape_elt Qreg) ^ ", " ^ (analyze_shape_elt Dreg) ^
+- ", " ^ (analyze_shape_elt Dreg)
+- | Long_noreg elt -> (analyze_shape_elt elt) ^ ", " ^ (analyze_shape_elt elt)
+- | Wide -> (analyze_shape_elt Qreg) ^ ", " ^ (analyze_shape_elt Qreg) ^
+- ", " ^ (analyze_shape_elt Dreg)
+- | Wide_noreg elt -> analyze_shape (Long_noreg elt)
+- | Narrow -> (analyze_shape_elt Dreg) ^ ", " ^ (analyze_shape_elt Qreg) ^
+- ", " ^ (analyze_shape_elt Qreg)
+- | Use_operands elts -> commas analyze_shape_elt (Array.to_list elts) ""
+- | By_scalar Dreg ->
+- analyze_shape (Use_operands [| Dreg; Dreg; Element_of_dreg |])
+- | By_scalar Qreg ->
+- analyze_shape (Use_operands [| Qreg; Qreg; Element_of_dreg |])
+- | By_scalar _ -> assert false
+- | Wide_lane ->
+- analyze_shape (Use_operands [| Qreg; Dreg; Element_of_dreg |])
+- | Wide_scalar ->
+- analyze_shape (Use_operands [| Qreg; Dreg; Element_of_dreg |])
+- | Pair_result elt ->
+- let elt_regexp = analyze_shape_elt elt in
+- elt_regexp ^ ", " ^ elt_regexp
+- | Unary_scalar _ -> "FIXME Unary_scalar"
+- | Binary_imm elt -> analyze_shape (Use_operands [| elt; elt; Immed |])
+- | Narrow_imm -> analyze_shape (Use_operands [| Dreg; Qreg; Immed |])
+- | Long_imm -> analyze_shape (Use_operands [| Qreg; Dreg; Immed |])
+-
+-(* Generate tests for one intrinsic. *)
+-let test_intrinsic dir opcode features shape name munge elt_ty =
+- (* Open the test source file. *)
+- let test_name = name ^ (string_of_elt elt_ty) in
+- let chan = open_test_file dir test_name in
+- (* Work out what argument and return types the intrinsic has. *)
+- let c_arity, new_elt_ty = munge shape elt_ty in
+- let c_types = check_types (strings_of_arity c_arity) in
+- (* Extract any constant valuator (a function specifying what constant
+- values are to be written into the intrinsic call) from the features
+- list. *)
+- let const_valuator =
+- try
+- match (List.find (fun feature -> match feature with
+- Const_valuator _ -> true
+- | _ -> false) features) with
+- Const_valuator f -> Some f
+- | _ -> assert false
+- with Not_found -> None
+- in
+- (* Work out what instruction name(s) to expect. *)
+- let insns = get_insn_names features name in
+- let no_suffix = (new_elt_ty = NoElts) in
+- let insns =
+- if no_suffix then insns
+- else List.map (fun insn ->
+- let suffix = string_of_elt_dots new_elt_ty in
+- insn ^ "\\." ^ suffix) insns
+- in
+- (* Construct a regexp to match against the expected instruction name(s). *)
+- let insn_regexp =
+- match insns with
+- [] -> assert false
+- | [insn] -> insn
+- | _ ->
+- let rec calc_regexp insns cur_regexp =
+- match insns with
+- [] -> cur_regexp
+- | [insn] -> cur_regexp ^ "(" ^ insn ^ "))"
+- | insn::insns -> calc_regexp insns (cur_regexp ^ "(" ^ insn ^ ")|")
+- in calc_regexp insns "("
+- in
+- (* Construct regexps to match against the instructions that this
+- intrinsic expands to. Watch out for any writeback character and
+- comments after the instruction. *)
+- let regexps = List.map (fun regexp -> insn_regexp ^ "\\[ \t\\]+" ^ regexp ^
+- "!?\\(\\[ \t\\]+@\\[a-zA-Z0-9 \\]+\\)?\\n")
+- (analyze_all_shapes features shape analyze_shape)
+- in
+- let effective_target = effective_target features in
+- let compile_test_optim = compile_test_optim features
+- in
+- (* Emit file and function prologues. *)
+- emit_prologue chan test_name effective_target compile_test_optim;
+-
+- if (compare compile_test_optim "-O0") <> 0 then
+- (* Emit variable declarations. *)
+- emit_variables chan c_types features "";
+-
+- Printf.fprintf chan "void test_%s (void)\n{\n" test_name;
+-
+- if compare compile_test_optim "-O0" = 0 then
+- (* Emit variable declarations. *)
+- emit_variables chan c_types features " ";
+-
+- Printf.fprintf chan "\n";
+- (* Emit the call to the intrinsic. *)
+- emit_call chan const_valuator c_types name elt_ty;
+- (* Emit the function epilogue and the DejaGNU scan-assembler directives. *)
+- emit_epilogue chan features regexps;
+- (* Close the test file. *)
+- close_out chan
+-
+-(* Generate tests for one element of the "ops" table. *)
+-let test_intrinsic_group dir (opcode, features, shape, name, munge, types) =
+- List.iter (test_intrinsic dir opcode features shape name munge) types
+-
+-(* Program entry point. *)
+-let _ =
+- let directory = if Array.length Sys.argv <> 1 then Sys.argv.(1) else "." in
+- List.iter (test_intrinsic_group directory) (reinterp @ reinterpq @ ops)
+-
+--- a/src/gcc/config/arm/neon.md
++++ b/src/gcc/config/arm/neon.md
+@@ -406,7 +406,7 @@
+ (match_operand:SI 2 "immediate_operand" "")]
+ "TARGET_NEON"
+ {
+- HOST_WIDE_INT elem = (HOST_WIDE_INT) 1 << INTVAL (operands[2]);
++ HOST_WIDE_INT elem = HOST_WIDE_INT_1 << INTVAL (operands[2]);
+ emit_insn (gen_vec_set<mode>_internal (operands[0], operands[1],
+ GEN_INT (elem), operands[0]));
+ DONE;
+@@ -505,6 +505,20 @@
+ (const_string "neon_add<q>")))]
+ )
+
++(define_insn "add<mode>3_fp16"
++ [(set
++ (match_operand:VH 0 "s_register_operand" "=w")
++ (plus:VH
++ (match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:VH 2 "s_register_operand" "w")))]
++ "TARGET_NEON_FP16INST"
++ "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set (attr "type")
++ (if_then_else (match_test "<Is_float_mode>")
++ (const_string "neon_fp_addsub_s<q>")
++ (const_string "neon_add<q>")))]
++)
++
+ (define_insn "adddi3_neon"
+ [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r")
+ (plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r")
+@@ -543,6 +557,17 @@
+ (const_string "neon_sub<q>")))]
+ )
+
++(define_insn "sub<mode>3_fp16"
++ [(set
++ (match_operand:VH 0 "s_register_operand" "=w")
++ (minus:VH
++ (match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:VH 2 "s_register_operand" "w")))]
++ "TARGET_NEON_FP16INST"
++ "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_sub<q>")]
++)
++
+ (define_insn "subdi3_neon"
+ [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r,?w")
+ (minus:DI (match_operand:DI 1 "s_register_operand" "w,0,r,0,w")
+@@ -591,6 +616,16 @@
+ (const_string "neon_mla_<V_elem_ch><q>")))]
+ )
+
++(define_insn "mul<mode>3add<mode>_neon"
++ [(set (match_operand:VH 0 "s_register_operand" "=w")
++ (plus:VH (mult:VH (match_operand:VH 2 "s_register_operand" "w")
++ (match_operand:VH 3 "s_register_operand" "w"))
++ (match_operand:VH 1 "s_register_operand" "0")))]
++ "TARGET_NEON_FP16INST && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
++ "vmla.f16\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
++ [(set_attr "type" "neon_fp_mla_s<q>")]
++)
++
+ (define_insn "mul<mode>3neg<mode>add<mode>_neon"
+ [(set (match_operand:VDQW 0 "s_register_operand" "=w")
+ (minus:VDQW (match_operand:VDQW 1 "s_register_operand" "0")
+@@ -629,6 +664,19 @@
+ [(set_attr "type" "neon_fp_mla_s<q>")]
+ )
+
++;; There is limited support for unsafe-math optimizations using the NEON FP16
++;; arithmetic instructions, so only the intrinsic is currently supported.
++(define_insn "fma<VH:mode>4_intrinsic"
++ [(set (match_operand:VH 0 "register_operand" "=w")
++ (fma:VH
++ (match_operand:VH 1 "register_operand" "w")
++ (match_operand:VH 2 "register_operand" "w")
++ (match_operand:VH 3 "register_operand" "0")))]
++ "TARGET_NEON_FP16INST"
++ "vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_fp_mla_s<q>")]
++)
++
+ (define_insn "*fmsub<VCVTF:mode>4"
+ [(set (match_operand:VCVTF 0 "register_operand" "=w")
+ (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
+@@ -640,13 +688,25 @@
+ )
+
+ (define_insn "fmsub<VCVTF:mode>4_intrinsic"
+- [(set (match_operand:VCVTF 0 "register_operand" "=w")
+- (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
+- (match_operand:VCVTF 2 "register_operand" "w")
+- (match_operand:VCVTF 3 "register_operand" "0")))]
+- "TARGET_NEON && TARGET_FMA"
+- "vfms%?.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+- [(set_attr "type" "neon_fp_mla_s<q>")]
++ [(set (match_operand:VCVTF 0 "register_operand" "=w")
++ (fma:VCVTF
++ (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w"))
++ (match_operand:VCVTF 2 "register_operand" "w")
++ (match_operand:VCVTF 3 "register_operand" "0")))]
++ "TARGET_NEON && TARGET_FMA"
++ "vfms%?.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_fp_mla_s<q>")]
++)
++
++(define_insn "fmsub<VH:mode>4_intrinsic"
++ [(set (match_operand:VH 0 "register_operand" "=w")
++ (fma:VH
++ (neg:VH (match_operand:VH 1 "register_operand" "w"))
++ (match_operand:VH 2 "register_operand" "w")
++ (match_operand:VH 3 "register_operand" "0")))]
++ "TARGET_NEON_FP16INST"
++ "vfms.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_fp_mla_s<q>")]
+ )
+
+ (define_insn "neon_vrint<NEON_VRINT:nvrint_variant><VCVTF:mode>"
+@@ -860,6 +920,44 @@
+ ""
+ )
+
++(define_insn "<absneg_str><mode>2"
++ [(set (match_operand:VH 0 "s_register_operand" "=w")
++ (ABSNEG:VH (match_operand:VH 1 "s_register_operand" "w")))]
++ "TARGET_NEON_FP16INST"
++ "v<absneg_str>.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
++ [(set_attr "type" "neon_abs<q>")]
++)
++
++(define_expand "neon_v<absneg_str><mode>"
++ [(set
++ (match_operand:VH 0 "s_register_operand")
++ (ABSNEG:VH (match_operand:VH 1 "s_register_operand")))]
++ "TARGET_NEON_FP16INST"
++{
++ emit_insn (gen_<absneg_str><mode>2 (operands[0], operands[1]));
++ DONE;
++})
++
++(define_insn "neon_v<fp16_rnd_str><mode>"
++ [(set (match_operand:VH 0 "s_register_operand" "=w")
++ (unspec:VH
++ [(match_operand:VH 1 "s_register_operand" "w")]
++ FP16_RND))]
++ "TARGET_NEON_FP16INST"
++ "<fp16_rnd_insn>.<V_s_elem>\t%<V_reg>0, %<V_reg>1"
++ [(set_attr "type" "neon_fp_round_s<q>")]
++)
++
++(define_insn "neon_vrsqrte<mode>"
++ [(set (match_operand:VH 0 "s_register_operand" "=w")
++ (unspec:VH
++ [(match_operand:VH 1 "s_register_operand" "w")]
++ UNSPEC_VRSQRTE))]
++ "TARGET_NEON_FP16INST"
++ "vrsqrte.f16\t%<V_reg>0, %<V_reg>1"
++ [(set_attr "type" "neon_fp_rsqrte_s<q>")]
++)
++
+ (define_insn "*umin<mode>3_neon"
+ [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+ (umin:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
+@@ -1082,7 +1180,7 @@
+ }
+ else
+ {
+- if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 1
++ if (operands[2] == CONST1_RTX (SImode)
+ && (!reg_overlap_mentioned_p (operands[0], operands[1])
+ || REGNO (operands[0]) == REGNO (operands[1])))
+ /* This clobbers CC. */
+@@ -1184,7 +1282,7 @@
+ }
+ else
+ {
+- if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 1
++ if (operands[2] == CONST1_RTX (SImode)
+ && (!reg_overlap_mentioned_p (operands[0], operands[1])
+ || REGNO (operands[0]) == REGNO (operands[1])))
+ /* This clobbers CC. */
+@@ -1204,16 +1302,133 @@
+
+ ;; Widening operations
+
++(define_expand "widen_ssum<mode>3"
++ [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
++ (plus:<V_double_width>
++ (sign_extend:<V_double_width>
++ (match_operand:VQI 1 "s_register_operand" ""))
++ (match_operand:<V_double_width> 2 "s_register_operand" "")))]
++ "TARGET_NEON"
++ {
++ machine_mode mode = GET_MODE (operands[1]);
++ rtx p1, p2;
++
++ p1 = arm_simd_vect_par_cnst_half (mode, false);
++ p2 = arm_simd_vect_par_cnst_half (mode, true);
++
++ if (operands[0] != operands[2])
++ emit_move_insn (operands[0], operands[2]);
++
++ emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0],
++ operands[1],
++ p1,
++ operands[0]));
++ emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0],
++ operands[1],
++ p2,
++ operands[0]));
++ DONE;
++ }
++)
++
++(define_insn "vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3"
++ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
++ (plus:<VW:V_widen>
++ (sign_extend:<VW:V_widen>
++ (vec_select:VW
++ (match_operand:VQI 1 "s_register_operand" "%w")
++ (match_operand:VQI 2 "vect_par_constant_low" "")))
++ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
++ "TARGET_NEON"
++{
++ return BYTES_BIG_ENDIAN ? "vaddw.<V_s_elem>\t%q0, %q3, %f1" :
++ "vaddw.<V_s_elem>\t%q0, %q3, %e1";
++}
++ [(set_attr "type" "neon_add_widen")])
++
++(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3"
++ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
++ (plus:<VW:V_widen>
++ (sign_extend:<VW:V_widen>
++ (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
++ (match_operand:VQI 2 "vect_par_constant_high" "")))
++ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
++ "TARGET_NEON"
++{
++ return BYTES_BIG_ENDIAN ? "vaddw.<V_s_elem>\t%q0, %q3, %e1" :
++ "vaddw.<V_s_elem>\t%q0, %q3, %f1";
++}
++ [(set_attr "type" "neon_add_widen")])
++
+ (define_insn "widen_ssum<mode>3"
+ [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+- (plus:<V_widen> (sign_extend:<V_widen>
+- (match_operand:VW 1 "s_register_operand" "%w"))
+- (match_operand:<V_widen> 2 "s_register_operand" "w")))]
++ (plus:<V_widen>
++ (sign_extend:<V_widen>
++ (match_operand:VW 1 "s_register_operand" "%w"))
++ (match_operand:<V_widen> 2 "s_register_operand" "w")))]
+ "TARGET_NEON"
+ "vaddw.<V_s_elem>\t%q0, %q2, %P1"
+ [(set_attr "type" "neon_add_widen")]
+ )
+
++(define_expand "widen_usum<mode>3"
++ [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
++ (plus:<V_double_width>
++ (zero_extend:<V_double_width>
++ (match_operand:VQI 1 "s_register_operand" ""))
++ (match_operand:<V_double_width> 2 "s_register_operand" "")))]
++ "TARGET_NEON"
++ {
++ machine_mode mode = GET_MODE (operands[1]);
++ rtx p1, p2;
++
++ p1 = arm_simd_vect_par_cnst_half (mode, false);
++ p2 = arm_simd_vect_par_cnst_half (mode, true);
++
++ if (operands[0] != operands[2])
++ emit_move_insn (operands[0], operands[2]);
++
++ emit_insn (gen_vec_sel_widen_usum_lo<mode><V_half>3 (operands[0],
++ operands[1],
++ p1,
++ operands[0]));
++ emit_insn (gen_vec_sel_widen_usum_hi<mode><V_half>3 (operands[0],
++ operands[1],
++ p2,
++ operands[0]));
++ DONE;
++ }
++)
++
++(define_insn "vec_sel_widen_usum_lo<VQI:mode><VW:mode>3"
++ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
++ (plus:<VW:V_widen>
++ (zero_extend:<VW:V_widen>
++ (vec_select:VW
++ (match_operand:VQI 1 "s_register_operand" "%w")
++ (match_operand:VQI 2 "vect_par_constant_low" "")))
++ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
++ "TARGET_NEON"
++{
++ return BYTES_BIG_ENDIAN ? "vaddw.<V_u_elem>\t%q0, %q3, %f1" :
++ "vaddw.<V_u_elem>\t%q0, %q3, %e1";
++}
++ [(set_attr "type" "neon_add_widen")])
++
++(define_insn "vec_sel_widen_usum_hi<VQI:mode><VW:mode>3"
++ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
++ (plus:<VW:V_widen>
++ (zero_extend:<VW:V_widen>
++ (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
++ (match_operand:VQI 2 "vect_par_constant_high" "")))
++ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
++ "TARGET_NEON"
++{
++ return BYTES_BIG_ENDIAN ? "vaddw.<V_u_elem>\t%q0, %q3, %e1" :
++ "vaddw.<V_u_elem>\t%q0, %q3, %f1";
++}
++ [(set_attr "type" "neon_add_widen")])
++
+ (define_insn "widen_usum<mode>3"
+ [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+ (plus:<V_widen> (zero_extend:<V_widen>
+@@ -1484,6 +1699,17 @@
+ (const_string "neon_reduc_add<q>")))]
+ )
+
++(define_insn "neon_vpaddv4hf"
++ [(set
++ (match_operand:V4HF 0 "s_register_operand" "=w")
++ (unspec:V4HF [(match_operand:V4HF 1 "s_register_operand" "w")
++ (match_operand:V4HF 2 "s_register_operand" "w")]
++ UNSPEC_VPADD))]
++ "TARGET_NEON_FP16INST"
++ "vpadd.f16\t%P0, %P1, %P2"
++ [(set_attr "type" "neon_reduc_add")]
++)
++
+ (define_insn "neon_vpsmin<mode>"
+ [(set (match_operand:VD 0 "s_register_operand" "=w")
+ (unspec:VD [(match_operand:VD 1 "s_register_operand" "w")
+@@ -1832,6 +2058,26 @@
+ DONE;
+ })
+
++(define_expand "neon_vadd<mode>"
++ [(match_operand:VH 0 "s_register_operand")
++ (match_operand:VH 1 "s_register_operand")
++ (match_operand:VH 2 "s_register_operand")]
++ "TARGET_NEON_FP16INST"
++{
++ emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2]));
++ DONE;
++})
++
++(define_expand "neon_vsub<mode>"
++ [(match_operand:VH 0 "s_register_operand")
++ (match_operand:VH 1 "s_register_operand")
++ (match_operand:VH 2 "s_register_operand")]
++ "TARGET_NEON_FP16INST"
++{
++ emit_insn (gen_sub<mode>3_fp16 (operands[0], operands[1], operands[2]));
++ DONE;
++})
++
+ ; Note that NEON operations don't support the full IEEE 754 standard: in
+ ; particular, denormal values are flushed to zero. This means that GCC cannot
+ ; use those instructions for autovectorization, etc. unless
+@@ -1923,6 +2169,17 @@
+ (const_string "neon_mul_<V_elem_ch><q>")))]
+ )
+
++(define_insn "neon_vmulf<mode>"
++ [(set
++ (match_operand:VH 0 "s_register_operand" "=w")
++ (mult:VH
++ (match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:VH 2 "s_register_operand" "w")))]
++ "TARGET_NEON_FP16INST"
++ "vmul.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_mul_<VH_elem_ch><q>")]
++)
++
+ (define_expand "neon_vmla<mode>"
+ [(match_operand:VDQW 0 "s_register_operand" "=w")
+ (match_operand:VDQW 1 "s_register_operand" "0")
+@@ -1951,6 +2208,18 @@
+ DONE;
+ })
+
++(define_expand "neon_vfma<VH:mode>"
++ [(match_operand:VH 0 "s_register_operand")
++ (match_operand:VH 1 "s_register_operand")
++ (match_operand:VH 2 "s_register_operand")
++ (match_operand:VH 3 "s_register_operand")]
++ "TARGET_NEON_FP16INST"
++{
++ emit_insn (gen_fma<mode>4_intrinsic (operands[0], operands[2], operands[3],
++ operands[1]));
++ DONE;
++})
++
+ (define_expand "neon_vfms<VCVTF:mode>"
+ [(match_operand:VCVTF 0 "s_register_operand")
+ (match_operand:VCVTF 1 "s_register_operand")
+@@ -1963,6 +2232,18 @@
+ DONE;
+ })
+
++(define_expand "neon_vfms<VH:mode>"
++ [(match_operand:VH 0 "s_register_operand")
++ (match_operand:VH 1 "s_register_operand")
++ (match_operand:VH 2 "s_register_operand")
++ (match_operand:VH 3 "s_register_operand")]
++ "TARGET_NEON_FP16INST"
++{
++ emit_insn (gen_fmsub<mode>4_intrinsic (operands[0], operands[2], operands[3],
++ operands[1]));
++ DONE;
++})
++
+ ; Used for intrinsics when flag_unsafe_math_optimizations is false.
+
+ (define_insn "neon_vmla<mode>_unspec"
+@@ -2263,6 +2544,72 @@
+ [(set_attr "type" "neon_fp_compare_s<q>")]
+ )
+
++(define_expand "neon_vc<cmp_op><mode>"
++ [(match_operand:<V_cmp_result> 0 "s_register_operand")
++ (neg:<V_cmp_result>
++ (COMPARISONS:VH
++ (match_operand:VH 1 "s_register_operand")
++ (match_operand:VH 2 "reg_or_zero_operand")))]
++ "TARGET_NEON_FP16INST"
++{
++ /* For FP comparisons use UNSPECS unless -funsafe-math-optimizations
++ are enabled. */
++ if (GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
++ && !flag_unsafe_math_optimizations)
++ emit_insn
++ (gen_neon_vc<cmp_op><mode>_fp16insn_unspec
++ (operands[0], operands[1], operands[2]));
++ else
++ emit_insn
++ (gen_neon_vc<cmp_op><mode>_fp16insn
++ (operands[0], operands[1], operands[2]));
++ DONE;
++})
++
++(define_insn "neon_vc<cmp_op><mode>_fp16insn"
++ [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
++ (neg:<V_cmp_result>
++ (COMPARISONS:<V_cmp_result>
++ (match_operand:VH 1 "s_register_operand" "w,w")
++ (match_operand:VH 2 "reg_or_zero_operand" "w,Dz"))))]
++ "TARGET_NEON_FP16INST
++ && !(GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
++ && !flag_unsafe_math_optimizations)"
++{
++ char pattern[100];
++ sprintf (pattern, "vc<cmp_op>.%s%%#<V_sz_elem>\t%%<V_reg>0,"
++ " %%<V_reg>1, %s",
++ GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT
++ ? "f" : "<cmp_type>",
++ which_alternative == 0
++ ? "%<V_reg>2" : "#0");
++ output_asm_insn (pattern, operands);
++ return "";
++}
++ [(set (attr "type")
++ (if_then_else (match_operand 2 "zero_operand")
++ (const_string "neon_compare_zero<q>")
++ (const_string "neon_compare<q>")))])
++
++(define_insn "neon_vc<cmp_op_unsp><mode>_fp16insn_unspec"
++ [(set
++ (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w")
++ (unspec:<V_cmp_result>
++ [(match_operand:VH 1 "s_register_operand" "w,w")
++ (match_operand:VH 2 "reg_or_zero_operand" "w,Dz")]
++ NEON_VCMP))]
++ "TARGET_NEON_FP16INST"
++{
++ char pattern[100];
++ sprintf (pattern, "vc<cmp_op_unsp>.f%%#<V_sz_elem>\t%%<V_reg>0,"
++ " %%<V_reg>1, %s",
++ which_alternative == 0
++ ? "%<V_reg>2" : "#0");
++ output_asm_insn (pattern, operands);
++ return "";
++}
++ [(set_attr "type" "neon_fp_compare_s<q>")])
++
+ (define_insn "neon_vc<cmp_op>u<mode>"
+ [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
+ (neg:<V_cmp_result>
+@@ -2314,6 +2661,60 @@
+ [(set_attr "type" "neon_fp_compare_s<q>")]
+ )
+
++(define_expand "neon_vca<cmp_op><mode>"
++ [(set
++ (match_operand:<V_cmp_result> 0 "s_register_operand")
++ (neg:<V_cmp_result>
++ (GLTE:<V_cmp_result>
++ (abs:VH (match_operand:VH 1 "s_register_operand"))
++ (abs:VH (match_operand:VH 2 "s_register_operand")))))]
++ "TARGET_NEON_FP16INST"
++{
++ if (flag_unsafe_math_optimizations)
++ emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn
++ (operands[0], operands[1], operands[2]));
++ else
++ emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn_unspec
++ (operands[0], operands[1], operands[2]));
++ DONE;
++})
++
++(define_insn "neon_vca<cmp_op><mode>_fp16insn"
++ [(set
++ (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
++ (neg:<V_cmp_result>
++ (GLTE:<V_cmp_result>
++ (abs:VH (match_operand:VH 1 "s_register_operand" "w"))
++ (abs:VH (match_operand:VH 2 "s_register_operand" "w")))))]
++ "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
++ "vac<cmp_op>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_fp_compare_s<q>")]
++)
++
++(define_insn "neon_vca<cmp_op_unsp><mode>_fp16insn_unspec"
++ [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w")
++ (unspec:<V_cmp_result>
++ [(match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:VH 2 "s_register_operand" "w")]
++ NEON_VAGLTE))]
++ "TARGET_NEON"
++ "vac<cmp_op_unsp>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_fp_compare_s<q>")]
++)
++
++(define_expand "neon_vc<cmp_op>z<mode>"
++ [(set
++ (match_operand:<V_cmp_result> 0 "s_register_operand")
++ (COMPARISONS:<V_cmp_result>
++ (match_operand:VH 1 "s_register_operand")
++ (const_int 0)))]
++ "TARGET_NEON_FP16INST"
++ {
++ emit_insn (gen_neon_vc<cmp_op><mode> (operands[0], operands[1],
++ CONST0_RTX (<MODE>mode)));
++ DONE;
++})
++
+ (define_insn "neon_vtst<mode>"
+ [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+ (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w")
+@@ -2334,6 +2735,16 @@
+ [(set_attr "type" "neon_abd<q>")]
+ )
+
++(define_insn "neon_vabd<mode>"
++ [(set (match_operand:VH 0 "s_register_operand" "=w")
++ (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:VH 2 "s_register_operand" "w")]
++ UNSPEC_VABD_F))]
++ "TARGET_NEON_FP16INST"
++ "vabd.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_abd<q>")]
++)
++
+ (define_insn "neon_vabdf<mode>"
+ [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+ (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+@@ -2396,6 +2807,40 @@
+ [(set_attr "type" "neon_fp_minmax_s<q>")]
+ )
+
++(define_insn "neon_v<maxmin>f<mode>"
++ [(set (match_operand:VH 0 "s_register_operand" "=w")
++ (unspec:VH
++ [(match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:VH 2 "s_register_operand" "w")]
++ VMAXMINF))]
++ "TARGET_NEON_FP16INST"
++ "v<maxmin>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_fp_minmax_s<q>")]
++)
++
++(define_insn "neon_vp<maxmin>fv4hf"
++ [(set (match_operand:V4HF 0 "s_register_operand" "=w")
++ (unspec:V4HF
++ [(match_operand:V4HF 1 "s_register_operand" "w")
++ (match_operand:V4HF 2 "s_register_operand" "w")]
++ VPMAXMINF))]
++ "TARGET_NEON_FP16INST"
++ "vp<maxmin>.f16\t%P0, %P1, %P2"
++ [(set_attr "type" "neon_reduc_minmax")]
++)
++
++(define_insn "neon_<fmaxmin_op><mode>"
++ [(set
++ (match_operand:VH 0 "s_register_operand" "=w")
++ (unspec:VH
++ [(match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:VH 2 "s_register_operand" "w")]
++ VMAXMINFNM))]
++ "TARGET_NEON_FP16INST"
++ "<fmaxmin_op>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_fp_minmax_s<q>")]
++)
++
+ ;; Vector forms for the IEEE-754 fmax()/fmin() functions
+ (define_insn "<fmaxmin><mode>3"
+ [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+@@ -2467,6 +2912,17 @@
+ [(set_attr "type" "neon_fp_recps_s<q>")]
+ )
+
++(define_insn "neon_vrecps<mode>"
++ [(set
++ (match_operand:VH 0 "s_register_operand" "=w")
++ (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:VH 2 "s_register_operand" "w")]
++ UNSPEC_VRECPS))]
++ "TARGET_NEON_FP16INST"
++ "vrecps.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_fp_recps_s<q>")]
++)
++
+ (define_insn "neon_vrsqrts<mode>"
+ [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
+ (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w")
+@@ -2477,6 +2933,17 @@
+ [(set_attr "type" "neon_fp_rsqrts_s<q>")]
+ )
+
++(define_insn "neon_vrsqrts<mode>"
++ [(set
++ (match_operand:VH 0 "s_register_operand" "=w")
++ (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:VH 2 "s_register_operand" "w")]
++ UNSPEC_VRSQRTS))]
++ "TARGET_NEON_FP16INST"
++ "vrsqrts.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
++ [(set_attr "type" "neon_fp_rsqrts_s<q>")]
++)
++
+ (define_expand "neon_vabs<mode>"
+ [(match_operand:VDQW 0 "s_register_operand" "")
+ (match_operand:VDQW 1 "s_register_operand" "")]
+@@ -2592,6 +3059,15 @@
+ })
+
+ (define_insn "neon_vrecpe<mode>"
++ [(set (match_operand:VH 0 "s_register_operand" "=w")
++ (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")]
++ UNSPEC_VRECPE))]
++ "TARGET_NEON_FP16INST"
++ "vrecpe.f16\t%<V_reg>0, %<V_reg>1"
++ [(set_attr "type" "neon_fp_recpe_s<q>")]
++)
++
++(define_insn "neon_vrecpe<mode>"
+ [(set (match_operand:V32 0 "s_register_operand" "=w")
+ (unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")]
+ UNSPEC_VRECPE))]
+@@ -2928,6 +3404,28 @@ if (BYTES_BIG_ENDIAN)
+ [(set_attr "type" "neon_dup<q>")]
+ )
+
++(define_insn "neon_vdup_lane<mode>_internal"
++ [(set (match_operand:VH 0 "s_register_operand" "=w")
++ (vec_duplicate:VH
++ (vec_select:<V_elem>
++ (match_operand:<V_double_vector_mode> 1 "s_register_operand" "w")
++ (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
++ "TARGET_NEON && TARGET_FP16"
++{
++ if (BYTES_BIG_ENDIAN)
++ {
++ int elt = INTVAL (operands[2]);
++ elt = GET_MODE_NUNITS (<V_double_vector_mode>mode) - 1 - elt;
++ operands[2] = GEN_INT (elt);
++ }
++ if (<Is_d_reg>)
++ return "vdup.<V_sz_elem>\t%P0, %P1[%c2]";
++ else
++ return "vdup.<V_sz_elem>\t%q0, %P1[%c2]";
++}
++ [(set_attr "type" "neon_dup<q>")]
++)
++
+ (define_expand "neon_vdup_lane<mode>"
+ [(match_operand:VDQW 0 "s_register_operand" "=w")
+ (match_operand:<V_double_vector_mode> 1 "s_register_operand" "w")
+@@ -2947,6 +3445,25 @@ if (BYTES_BIG_ENDIAN)
+ DONE;
+ })
+
++(define_expand "neon_vdup_lane<mode>"
++ [(match_operand:VH 0 "s_register_operand")
++ (match_operand:<V_double_vector_mode> 1 "s_register_operand")
++ (match_operand:SI 2 "immediate_operand")]
++ "TARGET_NEON && TARGET_FP16"
++{
++ if (BYTES_BIG_ENDIAN)
++ {
++ unsigned int elt = INTVAL (operands[2]);
++ unsigned int reg_nelts
++ = 64 / GET_MODE_UNIT_BITSIZE (<V_double_vector_mode>mode);
++ elt ^= reg_nelts - 1;
++ operands[2] = GEN_INT (elt);
++ }
++ emit_insn (gen_neon_vdup_lane<mode>_internal (operands[0], operands[1],
++ operands[2]));
++ DONE;
++})
++
+ ; Scalar index is ignored, since only zero is valid here.
+ (define_expand "neon_vdup_lanedi"
+ [(match_operand:DI 0 "s_register_operand" "=w")
+@@ -3093,6 +3610,28 @@ if (BYTES_BIG_ENDIAN)
+ [(set_attr "type" "neon_fp_cvt_narrow_s_q")]
+ )
+
++(define_insn "neon_vcvt<sup><mode>"
++ [(set
++ (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
++ (unspec:<VH_CVTTO>
++ [(match_operand:VCVTHI 1 "s_register_operand" "w")]
++ VCVT_US))]
++ "TARGET_NEON_FP16INST"
++ "vcvt.f16.<sup>%#16\t%<V_reg>0, %<V_reg>1"
++ [(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")]
++)
++
++(define_insn "neon_vcvt<sup><mode>"
++ [(set
++ (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
++ (unspec:<VH_CVTTO>
++ [(match_operand:VH 1 "s_register_operand" "w")]
++ VCVT_US))]
++ "TARGET_NEON_FP16INST"
++ "vcvt.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1"
++ [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
++)
++
+ (define_insn "neon_vcvt<sup>_n<mode>"
+ [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
+ (unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w")
+@@ -3107,6 +3646,20 @@ if (BYTES_BIG_ENDIAN)
+ )
+
+ (define_insn "neon_vcvt<sup>_n<mode>"
++ [(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
++ (unspec:<VH_CVTTO>
++ [(match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:SI 2 "immediate_operand" "i")]
++ VCVT_US_N))]
++ "TARGET_NEON_FP16INST"
++{
++ neon_const_bounds (operands[2], 0, 17);
++ return "vcvt.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1, %2";
++}
++ [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
++)
++
++(define_insn "neon_vcvt<sup>_n<mode>"
+ [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w")
+ (unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w")
+ (match_operand:SI 2 "immediate_operand" "i")]
+@@ -3119,6 +3672,31 @@ if (BYTES_BIG_ENDIAN)
+ [(set_attr "type" "neon_int_to_fp_<V_elem_ch><q>")]
+ )
+
++(define_insn "neon_vcvt<sup>_n<mode>"
++ [(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
++ (unspec:<VH_CVTTO>
++ [(match_operand:VCVTHI 1 "s_register_operand" "w")
++ (match_operand:SI 2 "immediate_operand" "i")]
++ VCVT_US_N))]
++ "TARGET_NEON_FP16INST"
++{
++ neon_const_bounds (operands[2], 0, 17);
++ return "vcvt.f16.<sup>%#16\t%<V_reg>0, %<V_reg>1, %2";
++}
++ [(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")]
++)
++
++(define_insn "neon_vcvt<vcvth_op><sup><mode>"
++ [(set
++ (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w")
++ (unspec:<VH_CVTTO>
++ [(match_operand:VH 1 "s_register_operand" "w")]
++ VCVT_HF_US))]
++ "TARGET_NEON_FP16INST"
++ "vcvt<vcvth_op>.<sup>%#16.f16\t%<V_reg>0, %<V_reg>1"
++ [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")]
++)
++
+ (define_insn "neon_vmovn<mode>"
+ [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w")
+ (unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")]
+@@ -3189,6 +3767,18 @@ if (BYTES_BIG_ENDIAN)
+ (const_string "neon_mul_<V_elem_ch>_scalar<q>")))]
+ )
+
++(define_insn "neon_vmul_lane<mode>"
++ [(set (match_operand:VH 0 "s_register_operand" "=w")
++ (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")
++ (match_operand:V4HF 2 "s_register_operand"
++ "<scalar_mul_constraint>")
++ (match_operand:SI 3 "immediate_operand" "i")]
++ UNSPEC_VMUL_LANE))]
++ "TARGET_NEON_FP16INST"
++ "vmul.f16\t%<V_reg>0, %<V_reg>1, %P2[%c3]"
++ [(set_attr "type" "neon_fp_mul_s_scalar<q>")]
++)
++
+ (define_insn "neon_vmull<sup>_lane<mode>"
+ [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
+ (unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w")
+@@ -3443,6 +4033,19 @@ if (BYTES_BIG_ENDIAN)
+ DONE;
+ })
+
++(define_expand "neon_vmul_n<mode>"
++ [(match_operand:VH 0 "s_register_operand")
++ (match_operand:VH 1 "s_register_operand")
++ (match_operand:<V_elem> 2 "s_register_operand")]
++ "TARGET_NEON_FP16INST"
++{
++ rtx tmp = gen_reg_rtx (V4HFmode);
++ emit_insn (gen_neon_vset_lanev4hf (tmp, operands[2], tmp, const0_rtx));
++ emit_insn (gen_neon_vmul_lane<mode> (operands[0], operands[1], tmp,
++ const0_rtx));
++ DONE;
++})
++
+ (define_expand "neon_vmulls_n<mode>"
+ [(match_operand:<V_widen> 0 "s_register_operand" "")
+ (match_operand:VMDI 1 "s_register_operand" "")
+@@ -4164,25 +4767,25 @@ if (BYTES_BIG_ENDIAN)
+
+ (define_expand "neon_vtrn<mode>_internal"
+ [(parallel
+- [(set (match_operand:VDQW 0 "s_register_operand" "")
+- (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "")
+- (match_operand:VDQW 2 "s_register_operand" "")]
++ [(set (match_operand:VDQWH 0 "s_register_operand")
++ (unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand")
++ (match_operand:VDQWH 2 "s_register_operand")]
+ UNSPEC_VTRN1))
+- (set (match_operand:VDQW 3 "s_register_operand" "")
+- (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VTRN2))])]
++ (set (match_operand:VDQWH 3 "s_register_operand")
++ (unspec:VDQWH [(match_dup 1) (match_dup 2)] UNSPEC_VTRN2))])]
+ "TARGET_NEON"
+ ""
+ )
+
+ ;; Note: Different operand numbering to handle tied registers correctly.
+ (define_insn "*neon_vtrn<mode>_insn"
+- [(set (match_operand:VDQW 0 "s_register_operand" "=&w")
+- (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
+- (match_operand:VDQW 3 "s_register_operand" "2")]
+- UNSPEC_VTRN1))
+- (set (match_operand:VDQW 2 "s_register_operand" "=&w")
+- (unspec:VDQW [(match_dup 1) (match_dup 3)]
+- UNSPEC_VTRN2))]
++ [(set (match_operand:VDQWH 0 "s_register_operand" "=&w")
++ (unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand" "0")
++ (match_operand:VDQWH 3 "s_register_operand" "2")]
++ UNSPEC_VTRN1))
++ (set (match_operand:VDQWH 2 "s_register_operand" "=&w")
++ (unspec:VDQWH [(match_dup 1) (match_dup 3)]
++ UNSPEC_VTRN2))]
+ "TARGET_NEON"
+ "vtrn.<V_sz_elem>\t%<V_reg>0, %<V_reg>2"
+ [(set_attr "type" "neon_permute<q>")]
+@@ -4190,25 +4793,25 @@ if (BYTES_BIG_ENDIAN)
+
+ (define_expand "neon_vzip<mode>_internal"
+ [(parallel
+- [(set (match_operand:VDQW 0 "s_register_operand" "")
+- (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "")
+- (match_operand:VDQW 2 "s_register_operand" "")]
+- UNSPEC_VZIP1))
+- (set (match_operand:VDQW 3 "s_register_operand" "")
+- (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VZIP2))])]
++ [(set (match_operand:VDQWH 0 "s_register_operand")
++ (unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand")
++ (match_operand:VDQWH 2 "s_register_operand")]
++ UNSPEC_VZIP1))
++ (set (match_operand:VDQWH 3 "s_register_operand")
++ (unspec:VDQWH [(match_dup 1) (match_dup 2)] UNSPEC_VZIP2))])]
+ "TARGET_NEON"
+ ""
+ )
+
+ ;; Note: Different operand numbering to handle tied registers correctly.
+ (define_insn "*neon_vzip<mode>_insn"
+- [(set (match_operand:VDQW 0 "s_register_operand" "=&w")
+- (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
+- (match_operand:VDQW 3 "s_register_operand" "2")]
+- UNSPEC_VZIP1))
+- (set (match_operand:VDQW 2 "s_register_operand" "=&w")
+- (unspec:VDQW [(match_dup 1) (match_dup 3)]
+- UNSPEC_VZIP2))]
++ [(set (match_operand:VDQWH 0 "s_register_operand" "=&w")
++ (unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand" "0")
++ (match_operand:VDQWH 3 "s_register_operand" "2")]
++ UNSPEC_VZIP1))
++ (set (match_operand:VDQWH 2 "s_register_operand" "=&w")
++ (unspec:VDQWH [(match_dup 1) (match_dup 3)]
++ UNSPEC_VZIP2))]
+ "TARGET_NEON"
+ "vzip.<V_sz_elem>\t%<V_reg>0, %<V_reg>2"
+ [(set_attr "type" "neon_zip<q>")]
+@@ -4216,25 +4819,25 @@ if (BYTES_BIG_ENDIAN)
+
+ (define_expand "neon_vuzp<mode>_internal"
+ [(parallel
+- [(set (match_operand:VDQW 0 "s_register_operand" "")
+- (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "")
+- (match_operand:VDQW 2 "s_register_operand" "")]
++ [(set (match_operand:VDQWH 0 "s_register_operand")
++ (unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand")
++ (match_operand:VDQWH 2 "s_register_operand")]
+ UNSPEC_VUZP1))
+- (set (match_operand:VDQW 3 "s_register_operand" "")
+- (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VUZP2))])]
++ (set (match_operand:VDQWH 3 "s_register_operand" "")
++ (unspec:VDQWH [(match_dup 1) (match_dup 2)] UNSPEC_VUZP2))])]
+ "TARGET_NEON"
+ ""
+ )
+
+ ;; Note: Different operand numbering to handle tied registers correctly.
+ (define_insn "*neon_vuzp<mode>_insn"
+- [(set (match_operand:VDQW 0 "s_register_operand" "=&w")
+- (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")
+- (match_operand:VDQW 3 "s_register_operand" "2")]
+- UNSPEC_VUZP1))
+- (set (match_operand:VDQW 2 "s_register_operand" "=&w")
+- (unspec:VDQW [(match_dup 1) (match_dup 3)]
+- UNSPEC_VUZP2))]
++ [(set (match_operand:VDQWH 0 "s_register_operand" "=&w")
++ (unspec:VDQWH [(match_operand:VDQWH 1 "s_register_operand" "0")
++ (match_operand:VDQWH 3 "s_register_operand" "2")]
++ UNSPEC_VUZP1))
++ (set (match_operand:VDQWH 2 "s_register_operand" "=&w")
++ (unspec:VDQWH [(match_dup 1) (match_dup 3)]
++ UNSPEC_VUZP2))]
+ "TARGET_NEON"
+ "vuzp.<V_sz_elem>\t%<V_reg>0, %<V_reg>2"
+ [(set_attr "type" "neon_zip<q>")]
+--- a/src/gcc/config/arm/neon.ml
++++ b/src//dev/null
+@@ -1,2357 +0,0 @@
+-(* Common code for ARM NEON header file, documentation and test case
+- generators.
+-
+- Copyright (C) 2006-2016 Free Software Foundation, Inc.
+- Contributed by CodeSourcery.
+-
+- This file is part of GCC.
+-
+- GCC is free software; you can redistribute it and/or modify it under
+- the terms of the GNU General Public License as published by the Free
+- Software Foundation; either version 3, or (at your option) any later
+- version.
+-
+- GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+- WARRANTY; without even the implied warranty of MERCHANTABILITY or
+- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+- for more details.
+-
+- You should have received a copy of the GNU General Public License
+- along with GCC; see the file COPYING3. If not see
+- <http://www.gnu.org/licenses/>. *)
+-
+-(* Shorthand types for vector elements. *)
+-type elts = S8 | S16 | S32 | S64 | F16 | F32 | U8 | U16 | U32 | U64 | P8 | P16
+- | P64 | P128 | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts
+- | Cast of elts * elts | NoElts
+-
+-type eltclass = Signed | Unsigned | Float | Poly | Int | Bits
+- | ConvClass of eltclass * eltclass | NoType
+-
+-(* These vector types correspond directly to C types. *)
+-type vectype = T_int8x8 | T_int8x16
+- | T_int16x4 | T_int16x8
+- | T_int32x2 | T_int32x4
+- | T_int64x1 | T_int64x2
+- | T_uint8x8 | T_uint8x16
+- | T_uint16x4 | T_uint16x8
+- | T_uint32x2 | T_uint32x4
+- | T_uint64x1 | T_uint64x2
+- | T_float16x4
+- | T_float32x2 | T_float32x4
+- | T_poly8x8 | T_poly8x16
+- | T_poly16x4 | T_poly16x8
+- | T_immediate of int * int
+- | T_int8 | T_int16
+- | T_int32 | T_int64
+- | T_uint8 | T_uint16
+- | T_uint32 | T_uint64
+- | T_poly8 | T_poly16
+- | T_poly64 | T_poly64x1
+- | T_poly64x2 | T_poly128
+- | T_float16 | T_float32
+- | T_arrayof of int * vectype
+- | T_ptrto of vectype | T_const of vectype
+- | T_void | T_intQI
+- | T_intHI | T_intSI
+- | T_intDI | T_intTI
+- | T_floatHF | T_floatSF
+-
+-(* The meanings of the following are:
+- TImode : "Tetra", two registers (four words).
+- EImode : "hExa", three registers (six words).
+- OImode : "Octa", four registers (eight words).
+- CImode : "dodeCa", six registers (twelve words).
+- XImode : "heXadeca", eight registers (sixteen words).
+-*)
+-
+-type inttype = B_TImode | B_EImode | B_OImode | B_CImode | B_XImode
+-
+-type shape_elt = Dreg | Qreg | Corereg | Immed | VecArray of int * shape_elt
+- | PtrTo of shape_elt | CstPtrTo of shape_elt
+- (* These next ones are used only in the test generator. *)
+- | Element_of_dreg (* Used for "lane" variants. *)
+- | Element_of_qreg (* Likewise. *)
+- | All_elements_of_dreg (* Used for "dup" variants. *)
+- | Alternatives of shape_elt list (* Used for multiple valid operands *)
+-
+-type shape_form = All of int * shape_elt
+- | Long
+- | Long_noreg of shape_elt
+- | Wide
+- | Wide_noreg of shape_elt
+- | Narrow
+- | Long_imm
+- | Narrow_imm
+- | Binary_imm of shape_elt
+- | Use_operands of shape_elt array
+- | By_scalar of shape_elt
+- | Unary_scalar of shape_elt
+- | Wide_lane
+- | Wide_scalar
+- | Pair_result of shape_elt
+-
+-type arity = Arity0 of vectype
+- | Arity1 of vectype * vectype
+- | Arity2 of vectype * vectype * vectype
+- | Arity3 of vectype * vectype * vectype * vectype
+- | Arity4 of vectype * vectype * vectype * vectype * vectype
+-
+-type vecmode = V8QI | V4HI | V4HF |V2SI | V2SF | DI
+- | V16QI | V8HI | V4SI | V4SF | V2DI | TI
+- | QI | HI | SI | SF
+-
+-type opcode =
+- (* Binary ops. *)
+- Vadd
+- | Vmul
+- | Vmla
+- | Vmls
+- | Vfma
+- | Vfms
+- | Vsub
+- | Vceq
+- | Vcge
+- | Vcgt
+- | Vcle
+- | Vclt
+- | Vcage
+- | Vcagt
+- | Vcale
+- | Vcalt
+- | Vtst
+- | Vabd
+- | Vaba
+- | Vmax
+- | Vmin
+- | Vpadd
+- | Vpada
+- | Vpmax
+- | Vpmin
+- | Vrecps
+- | Vrsqrts
+- | Vshl
+- | Vshr_n
+- | Vshl_n
+- | Vsra_n
+- | Vsri
+- | Vsli
+- (* Logic binops. *)
+- | Vand
+- | Vorr
+- | Veor
+- | Vbic
+- | Vorn
+- | Vbsl
+- (* Ops with scalar. *)
+- | Vmul_lane
+- | Vmla_lane
+- | Vmls_lane
+- | Vmul_n
+- | Vmla_n
+- | Vmls_n
+- | Vmull_n
+- | Vmull_lane
+- | Vqdmull_n
+- | Vqdmull_lane
+- | Vqdmulh_n
+- | Vqdmulh_lane
+- (* Unary ops. *)
+- | Vrintn
+- | Vrinta
+- | Vrintp
+- | Vrintm
+- | Vrintz
+- | Vabs
+- | Vneg
+- | Vcls
+- | Vclz
+- | Vcnt
+- | Vrecpe
+- | Vrsqrte
+- | Vmvn
+- (* Vector extract. *)
+- | Vext
+- (* Reverse elements. *)
+- | Vrev64
+- | Vrev32
+- | Vrev16
+- (* Transposition ops. *)
+- | Vtrn
+- | Vzip
+- | Vuzp
+- (* Loads and stores (VLD1/VST1/VLD2...), elements and structures. *)
+- | Vldx of int
+- | Vstx of int
+- | Vldx_lane of int
+- | Vldx_dup of int
+- | Vstx_lane of int
+- (* Set/extract lanes from a vector. *)
+- | Vget_lane
+- | Vset_lane
+- (* Initialize vector from bit pattern. *)
+- | Vcreate
+- (* Set all lanes to same value. *)
+- | Vdup_n
+- | Vmov_n (* Is this the same? *)
+- (* Duplicate scalar to all lanes of vector. *)
+- | Vdup_lane
+- (* Combine vectors. *)
+- | Vcombine
+- (* Get quadword high/low parts. *)
+- | Vget_high
+- | Vget_low
+- (* Convert vectors. *)
+- | Vcvt
+- | Vcvt_n
+- (* Narrow/lengthen vectors. *)
+- | Vmovn
+- | Vmovl
+- (* Table lookup. *)
+- | Vtbl of int
+- | Vtbx of int
+- (* Reinterpret casts. *)
+- | Vreinterp
+-
+-let rev_elems revsize elsize nelts _ =
+- let mask = (revsize / elsize) - 1 in
+- let arr = Array.init nelts
+- (fun i -> i lxor mask) in
+- Array.to_list arr
+-
+-let permute_range i stride nelts increment =
+- let rec build i = function
+- 0 -> []
+- | nelts -> i :: (i + stride) :: build (i + increment) (pred nelts) in
+- build i nelts
+-
+-(* Generate a list of integers suitable for vzip. *)
+-let zip_range i stride nelts = permute_range i stride nelts 1
+-
+-(* Generate a list of integers suitable for vunzip. *)
+-let uzip_range i stride nelts = permute_range i stride nelts 4
+-
+-(* Generate a list of integers suitable for trn. *)
+-let trn_range i stride nelts = permute_range i stride nelts 2
+-
+-let zip_elems _ nelts part =
+- match part with
+- `lo -> zip_range 0 nelts (nelts / 2)
+- | `hi -> zip_range (nelts / 2) nelts (nelts / 2)
+-
+-let uzip_elems _ nelts part =
+- match part with
+- `lo -> uzip_range 0 2 (nelts / 2)
+- | `hi -> uzip_range 1 2 (nelts / 2)
+-
+-let trn_elems _ nelts part =
+- match part with
+- `lo -> trn_range 0 nelts (nelts / 2)
+- | `hi -> trn_range 1 nelts (nelts / 2)
+-
+-(* Features used for documentation, to distinguish between some instruction
+- variants, and to signal special requirements (e.g. swapping arguments). *)
+-
+-type features =
+- Halving
+- | Rounding
+- | Saturating
+- | Dst_unsign
+- | High_half
+- | Doubling
+- | Flipped of string (* Builtin name to use with flipped arguments. *)
+- | InfoWord (* Pass an extra word for signage/rounding etc. (always passed
+- for All _, Long, Wide, Narrow shape_forms. *)
+- (* Implement builtin as shuffle. The parameter is a function which returns
+- masks suitable for __builtin_shuffle: arguments are (element size,
+- number of elements, high/low part selector). *)
+- | Use_shuffle of (int -> int -> [`lo|`hi] -> int list)
+- (* A specification as to the shape of instruction expected upon
+- disassembly, used if it differs from the shape used to build the
+- intrinsic prototype. Multiple entries in the constructor's argument
+- indicate that the intrinsic expands to more than one assembly
+- instruction, each with a corresponding shape specified here. *)
+- | Disassembles_as of shape_form list
+- | Builtin_name of string (* Override the name of the builtin. *)
+- (* Override the name of the instruction. If more than one name
+- is specified, it means that the instruction can have any of those
+- names. *)
+- | Instruction_name of string list
+- (* Mark that the intrinsic yields no instructions, or expands to yield
+- behavior that the test generator cannot test. *)
+- | No_op
+- (* Mark that the intrinsic has constant arguments that cannot be set
+- to the defaults (zero for pointers and one otherwise) in the test
+- cases. The function supplied must return the integer to be written
+- into the testcase for the argument number (0-based) supplied to it. *)
+- | Const_valuator of (int -> int)
+- | Fixed_vector_reg
+- | Fixed_core_reg
+- (* Mark that the intrinsic requires __ARM_FEATURE_string to be defined. *)
+- | Requires_feature of string
+- (* Mark that the intrinsic requires a particular architecture version. *)
+- | Requires_arch of int
+- (* Mark that the intrinsic requires a particular bit in __ARM_FP to
+- be set. *)
+- | Requires_FP_bit of int
+- (* Compiler optimization level for the test. *)
+- | Compiler_optim of string
+-
+-exception MixedMode of elts * elts
+-
+-let rec elt_width = function
+- S8 | U8 | P8 | I8 | B8 -> 8
+- | S16 | U16 | P16 | I16 | B16 | F16 -> 16
+- | S32 | F32 | U32 | I32 | B32 -> 32
+- | S64 | U64 | P64 | I64 | B64 -> 64
+- | P128 -> 128
+- | Conv (a, b) ->
+- let wa = elt_width a and wb = elt_width b in
+- if wa = wb then wa else raise (MixedMode (a, b))
+- | Cast (a, b) -> raise (MixedMode (a, b))
+- | NoElts -> failwith "No elts"
+-
+-let rec elt_class = function
+- S8 | S16 | S32 | S64 -> Signed
+- | U8 | U16 | U32 | U64 -> Unsigned
+- | P8 | P16 | P64 | P128 -> Poly
+- | F16 | F32 -> Float
+- | I8 | I16 | I32 | I64 -> Int
+- | B8 | B16 | B32 | B64 -> Bits
+- | Conv (a, b) | Cast (a, b) -> ConvClass (elt_class a, elt_class b)
+- | NoElts -> NoType
+-
+-let elt_of_class_width c w =
+- match c, w with
+- Signed, 8 -> S8
+- | Signed, 16 -> S16
+- | Signed, 32 -> S32
+- | Signed, 64 -> S64
+- | Float, 16 -> F16
+- | Float, 32 -> F32
+- | Unsigned, 8 -> U8
+- | Unsigned, 16 -> U16
+- | Unsigned, 32 -> U32
+- | Unsigned, 64 -> U64
+- | Poly, 8 -> P8
+- | Poly, 16 -> P16
+- | Poly, 64 -> P64
+- | Poly, 128 -> P128
+- | Int, 8 -> I8
+- | Int, 16 -> I16
+- | Int, 32 -> I32
+- | Int, 64 -> I64
+- | Bits, 8 -> B8
+- | Bits, 16 -> B16
+- | Bits, 32 -> B32
+- | Bits, 64 -> B64
+- | _ -> failwith "Bad element type"
+-
+-(* Return unsigned integer element the same width as argument. *)
+-let unsigned_of_elt elt =
+- elt_of_class_width Unsigned (elt_width elt)
+-
+-let signed_of_elt elt =
+- elt_of_class_width Signed (elt_width elt)
+-
+-(* Return untyped bits element the same width as argument. *)
+-let bits_of_elt elt =
+- elt_of_class_width Bits (elt_width elt)
+-
+-let non_signed_variant = function
+- S8 -> I8
+- | S16 -> I16
+- | S32 -> I32
+- | S64 -> I64
+- | U8 -> I8
+- | U16 -> I16
+- | U32 -> I32
+- | U64 -> I64
+- | x -> x
+-
+-let poly_unsigned_variant v =
+- let elclass = match elt_class v with
+- Poly -> Unsigned
+- | x -> x in
+- elt_of_class_width elclass (elt_width v)
+-
+-let widen_elt elt =
+- let w = elt_width elt
+- and c = elt_class elt in
+- elt_of_class_width c (w * 2)
+-
+-let narrow_elt elt =
+- let w = elt_width elt
+- and c = elt_class elt in
+- elt_of_class_width c (w / 2)
+-
+-(* If we're trying to find a mode from a "Use_operands" instruction, use the
+- last vector operand as the dominant mode used to invoke the correct builtin.
+- We must stick to this rule in neon.md. *)
+-let find_key_operand operands =
+- let rec scan opno =
+- match operands.(opno) with
+- Qreg -> Qreg
+- | Dreg -> Dreg
+- | VecArray (_, Qreg) -> Qreg
+- | VecArray (_, Dreg) -> Dreg
+- | _ -> scan (opno-1)
+- in
+- scan ((Array.length operands) - 1)
+-
+-(* Find a vecmode from a shape_elt ELT for an instruction with shape_form
+- SHAPE. For a Use_operands shape, if ARGPOS is passed then return the mode
+- for the given argument position, else determine which argument to return a
+- mode for automatically. *)
+-
+-let rec mode_of_elt ?argpos elt shape =
+- let flt = match elt_class elt with
+- Float | ConvClass(_, Float) -> true | _ -> false in
+- let idx =
+- match elt_width elt with
+- 8 -> 0 | 16 -> 1 | 32 -> 2 | 64 -> 3 | 128 -> 4
+- | _ -> failwith "Bad element width"
+- in match shape with
+- All (_, Dreg) | By_scalar Dreg | Pair_result Dreg | Unary_scalar Dreg
+- | Binary_imm Dreg | Long_noreg Dreg | Wide_noreg Dreg ->
+- if flt then
+- [| V8QI; V4HF; V2SF; DI |].(idx)
+- else
+- [| V8QI; V4HI; V2SI; DI |].(idx)
+- | All (_, Qreg) | By_scalar Qreg | Pair_result Qreg | Unary_scalar Qreg
+- | Binary_imm Qreg | Long_noreg Qreg | Wide_noreg Qreg ->
+- [| V16QI; V8HI; if flt then V4SF else V4SI; V2DI; TI|].(idx)
+- | All (_, (Corereg | PtrTo _ | CstPtrTo _)) ->
+- [| QI; HI; if flt then SF else SI; DI |].(idx)
+- | Long | Wide | Wide_lane | Wide_scalar
+- | Long_imm ->
+- [| V8QI; V4HI; V2SI; DI |].(idx)
+- | Narrow | Narrow_imm -> [| V16QI; V8HI; V4SI; V2DI |].(idx)
+- | Use_operands ops ->
+- begin match argpos with
+- None -> mode_of_elt ?argpos elt (All (0, (find_key_operand ops)))
+- | Some pos -> mode_of_elt ?argpos elt (All (0, ops.(pos)))
+- end
+- | _ -> failwith "invalid shape"
+-
+-(* Modify an element type dependent on the shape of the instruction and the
+- operand number. *)
+-
+-let shapemap shape no =
+- let ident = fun x -> x in
+- match shape with
+- All _ | Use_operands _ | By_scalar _ | Pair_result _ | Unary_scalar _
+- | Binary_imm _ -> ident
+- | Long | Long_noreg _ | Wide_scalar | Long_imm ->
+- [| widen_elt; ident; ident |].(no)
+- | Wide | Wide_noreg _ -> [| widen_elt; widen_elt; ident |].(no)
+- | Wide_lane -> [| widen_elt; ident; ident; ident |].(no)
+- | Narrow | Narrow_imm -> [| narrow_elt; ident; ident |].(no)
+-
+-(* Register type (D/Q) of an operand, based on shape and operand number. *)
+-
+-let regmap shape no =
+- match shape with
+- All (_, reg) | Long_noreg reg | Wide_noreg reg -> reg
+- | Long -> [| Qreg; Dreg; Dreg |].(no)
+- | Wide -> [| Qreg; Qreg; Dreg |].(no)
+- | Narrow -> [| Dreg; Qreg; Qreg |].(no)
+- | Wide_lane -> [| Qreg; Dreg; Dreg; Immed |].(no)
+- | Wide_scalar -> [| Qreg; Dreg; Corereg |].(no)
+- | By_scalar reg -> [| reg; reg; Dreg; Immed |].(no)
+- | Unary_scalar reg -> [| reg; Dreg; Immed |].(no)
+- | Pair_result reg -> [| VecArray (2, reg); reg; reg |].(no)
+- | Binary_imm reg -> [| reg; reg; Immed |].(no)
+- | Long_imm -> [| Qreg; Dreg; Immed |].(no)
+- | Narrow_imm -> [| Dreg; Qreg; Immed |].(no)
+- | Use_operands these -> these.(no)
+-
+-let type_for_elt shape elt no =
+- let elt = (shapemap shape no) elt in
+- let reg = regmap shape no in
+- let rec type_for_reg_elt reg elt =
+- match reg with
+- Dreg ->
+- begin match elt with
+- S8 -> T_int8x8
+- | S16 -> T_int16x4
+- | S32 -> T_int32x2
+- | S64 -> T_int64x1
+- | U8 -> T_uint8x8
+- | U16 -> T_uint16x4
+- | U32 -> T_uint32x2
+- | U64 -> T_uint64x1
+- | P64 -> T_poly64x1
+- | P128 -> T_poly128
+- | F16 -> T_float16x4
+- | F32 -> T_float32x2
+- | P8 -> T_poly8x8
+- | P16 -> T_poly16x4
+- | _ -> failwith "Bad elt type for Dreg"
+- end
+- | Qreg ->
+- begin match elt with
+- S8 -> T_int8x16
+- | S16 -> T_int16x8
+- | S32 -> T_int32x4
+- | S64 -> T_int64x2
+- | U8 -> T_uint8x16
+- | U16 -> T_uint16x8
+- | U32 -> T_uint32x4
+- | U64 -> T_uint64x2
+- | F32 -> T_float32x4
+- | P8 -> T_poly8x16
+- | P16 -> T_poly16x8
+- | P64 -> T_poly64x2
+- | P128 -> T_poly128
+- | _ -> failwith "Bad elt type for Qreg"
+- end
+- | Corereg ->
+- begin match elt with
+- S8 -> T_int8
+- | S16 -> T_int16
+- | S32 -> T_int32
+- | S64 -> T_int64
+- | U8 -> T_uint8
+- | U16 -> T_uint16
+- | U32 -> T_uint32
+- | U64 -> T_uint64
+- | P8 -> T_poly8
+- | P16 -> T_poly16
+- | P64 -> T_poly64
+- | P128 -> T_poly128
+- | F32 -> T_float32
+- | _ -> failwith "Bad elt type for Corereg"
+- end
+- | Immed ->
+- T_immediate (0, 0)
+- | VecArray (num, sub) ->
+- T_arrayof (num, type_for_reg_elt sub elt)
+- | PtrTo x ->
+- T_ptrto (type_for_reg_elt x elt)
+- | CstPtrTo x ->
+- T_ptrto (T_const (type_for_reg_elt x elt))
+- (* Anything else is solely for the use of the test generator. *)
+- | _ -> assert false
+- in
+- type_for_reg_elt reg elt
+-
+-(* Return size of a vector type, in bits. *)
+-let vectype_size = function
+- T_int8x8 | T_int16x4 | T_int32x2 | T_int64x1
+- | T_uint8x8 | T_uint16x4 | T_uint32x2 | T_uint64x1
+- | T_float32x2 | T_poly8x8 | T_poly64x1 | T_poly16x4 | T_float16x4 -> 64
+- | T_int8x16 | T_int16x8 | T_int32x4 | T_int64x2
+- | T_uint8x16 | T_uint16x8 | T_uint32x4 | T_uint64x2
+- | T_float32x4 | T_poly8x16 | T_poly64x2 | T_poly16x8 -> 128
+- | _ -> raise Not_found
+-
+-let inttype_for_array num elttype =
+- let eltsize = vectype_size elttype in
+- let numwords = (num * eltsize) / 32 in
+- match numwords with
+- 4 -> B_TImode
+- | 6 -> B_EImode
+- | 8 -> B_OImode
+- | 12 -> B_CImode
+- | 16 -> B_XImode
+- | _ -> failwith ("no int type for size " ^ string_of_int numwords)
+-
+-(* These functions return pairs of (internal, external) types, where "internal"
+- types are those seen by GCC, and "external" are those seen by the assembler.
+- These types aren't necessarily the same, since the intrinsics can munge more
+- than one C type into each assembler opcode. *)
+-
+-let make_sign_invariant func shape elt =
+- let arity, elt' = func shape elt in
+- arity, non_signed_variant elt'
+-
+-(* Don't restrict any types. *)
+-
+-let elts_same make_arity shape elt =
+- let vtype = type_for_elt shape elt in
+- make_arity vtype, elt
+-
+-(* As sign_invar_*, but when sign matters. *)
+-let elts_same_io_lane =
+- elts_same (fun vtype -> Arity4 (vtype 0, vtype 0, vtype 1, vtype 2, vtype 3))
+-
+-let elts_same_io =
+- elts_same (fun vtype -> Arity3 (vtype 0, vtype 0, vtype 1, vtype 2))
+-
+-let elts_same_2_lane =
+- elts_same (fun vtype -> Arity3 (vtype 0, vtype 1, vtype 2, vtype 3))
+-
+-let elts_same_3 = elts_same_2_lane
+-
+-let elts_same_2 =
+- elts_same (fun vtype -> Arity2 (vtype 0, vtype 1, vtype 2))
+-
+-let elts_same_1 =
+- elts_same (fun vtype -> Arity1 (vtype 0, vtype 1))
+-
+-(* Use for signed/unsigned invariant operations (i.e. where the operation
+- doesn't depend on the sign of the data. *)
+-
+-let sign_invar_io_lane = make_sign_invariant elts_same_io_lane
+-let sign_invar_io = make_sign_invariant elts_same_io
+-let sign_invar_2_lane = make_sign_invariant elts_same_2_lane
+-let sign_invar_2 = make_sign_invariant elts_same_2
+-let sign_invar_1 = make_sign_invariant elts_same_1
+-
+-(* Sign-sensitive comparison. *)
+-
+-let cmp_sign_matters shape elt =
+- let vtype = type_for_elt shape elt
+- and rtype = type_for_elt shape (unsigned_of_elt elt) 0 in
+- Arity2 (rtype, vtype 1, vtype 2), elt
+-
+-(* Signed/unsigned invariant comparison. *)
+-
+-let cmp_sign_invar shape elt =
+- let shape', elt' = cmp_sign_matters shape elt in
+- let elt'' =
+- match non_signed_variant elt' with
+- P8 -> I8
+- | x -> x
+- in
+- shape', elt''
+-
+-(* Comparison (VTST) where only the element width matters. *)
+-
+-let cmp_bits shape elt =
+- let vtype = type_for_elt shape elt
+- and rtype = type_for_elt shape (unsigned_of_elt elt) 0
+- and bits_only = bits_of_elt elt in
+- Arity2 (rtype, vtype 1, vtype 2), bits_only
+-
+-let reg_shift shape elt =
+- let vtype = type_for_elt shape elt
+- and op2type = type_for_elt shape (signed_of_elt elt) 2 in
+- Arity2 (vtype 0, vtype 1, op2type), elt
+-
+-(* Genericised constant-shift type-generating function. *)
+-
+-let const_shift mkimm ?arity ?result shape elt =
+- let op2type = (shapemap shape 2) elt in
+- let op2width = elt_width op2type in
+- let op2 = mkimm op2width
+- and op1 = type_for_elt shape elt 1
+- and r_elt =
+- match result with
+- None -> elt
+- | Some restriction -> restriction elt in
+- let rtype = type_for_elt shape r_elt 0 in
+- match arity with
+- None -> Arity2 (rtype, op1, op2), elt
+- | Some mkarity -> mkarity rtype op1 op2, elt
+-
+-(* Use for immediate right-shifts. *)
+-
+-let shift_right shape elt =
+- const_shift (fun imm -> T_immediate (1, imm)) shape elt
+-
+-let shift_right_acc shape elt =
+- const_shift (fun imm -> T_immediate (1, imm))
+- ~arity:(fun dst op1 op2 -> Arity3 (dst, dst, op1, op2)) shape elt
+-
+-(* Use for immediate right-shifts when the operation doesn't care about
+- signedness. *)
+-
+-let shift_right_sign_invar =
+- make_sign_invariant shift_right
+-
+-(* Immediate right-shift; result is unsigned even when operand is signed. *)
+-
+-let shift_right_to_uns shape elt =
+- const_shift (fun imm -> T_immediate (1, imm)) ~result:unsigned_of_elt
+- shape elt
+-
+-(* Immediate left-shift. *)
+-
+-let shift_left shape elt =
+- const_shift (fun imm -> T_immediate (0, imm - 1)) shape elt
+-
+-(* Immediate left-shift, unsigned result. *)
+-
+-let shift_left_to_uns shape elt =
+- const_shift (fun imm -> T_immediate (0, imm - 1)) ~result:unsigned_of_elt
+- shape elt
+-
+-(* Immediate left-shift, don't care about signs. *)
+-
+-let shift_left_sign_invar =
+- make_sign_invariant shift_left
+-
+-(* Shift left/right and insert: only element size matters. *)
+-
+-let shift_insert shape elt =
+- let arity, elt =
+- const_shift (fun imm -> T_immediate (1, imm))
+- ~arity:(fun dst op1 op2 -> Arity3 (dst, dst, op1, op2)) shape elt in
+- arity, bits_of_elt elt
+-
+-(* Get/set lane. *)
+-
+-let get_lane shape elt =
+- let vtype = type_for_elt shape elt in
+- Arity2 (vtype 0, vtype 1, vtype 2),
+- (match elt with P8 -> U8 | P16 -> U16 | S32 | U32 | F32 -> B32 | x -> x)
+-
+-let set_lane shape elt =
+- let vtype = type_for_elt shape elt in
+- Arity3 (vtype 0, vtype 1, vtype 2, vtype 3), bits_of_elt elt
+-
+-let set_lane_notype shape elt =
+- let vtype = type_for_elt shape elt in
+- Arity3 (vtype 0, vtype 1, vtype 2, vtype 3), NoElts
+-
+-let create_vector shape elt =
+- let vtype = type_for_elt shape U64 1
+- and rtype = type_for_elt shape elt 0 in
+- Arity1 (rtype, vtype), elt
+-
+-let conv make_arity shape elt =
+- let edest, esrc = match elt with
+- Conv (edest, esrc) | Cast (edest, esrc) -> edest, esrc
+- | _ -> failwith "Non-conversion element in conversion" in
+- let vtype = type_for_elt shape esrc
+- and rtype = type_for_elt shape edest 0 in
+- make_arity rtype vtype, elt
+-
+-let conv_1 = conv (fun rtype vtype -> Arity1 (rtype, vtype 1))
+-let conv_2 = conv (fun rtype vtype -> Arity2 (rtype, vtype 1, vtype 2))
+-
+-(* Operation has an unsigned result even if operands are signed. *)
+-
+-let dst_unsign make_arity shape elt =
+- let vtype = type_for_elt shape elt
+- and rtype = type_for_elt shape (unsigned_of_elt elt) 0 in
+- make_arity rtype vtype, elt
+-
+-let dst_unsign_1 = dst_unsign (fun rtype vtype -> Arity1 (rtype, vtype 1))
+-
+-let make_bits_only func shape elt =
+- let arity, elt' = func shape elt in
+- arity, bits_of_elt elt'
+-
+-(* Extend operation. *)
+-
+-let extend shape elt =
+- let vtype = type_for_elt shape elt in
+- Arity3 (vtype 0, vtype 1, vtype 2, vtype 3), bits_of_elt elt
+-
+-(* Table look-up operations. Operand 2 is signed/unsigned for signed/unsigned
+- integer ops respectively, or unsigned for polynomial ops. *)
+-
+-let table mkarity shape elt =
+- let vtype = type_for_elt shape elt in
+- let op2 = type_for_elt shape (poly_unsigned_variant elt) 2 in
+- mkarity vtype op2, bits_of_elt elt
+-
+-let table_2 = table (fun vtype op2 -> Arity2 (vtype 0, vtype 1, op2))
+-let table_io = table (fun vtype op2 -> Arity3 (vtype 0, vtype 0, vtype 1, op2))
+-
+-(* Operations where only bits matter. *)
+-
+-let bits_1 = make_bits_only elts_same_1
+-let bits_2 = make_bits_only elts_same_2
+-let bits_3 = make_bits_only elts_same_3
+-
+-(* Store insns. *)
+-let store_1 shape elt =
+- let vtype = type_for_elt shape elt in
+- Arity2 (T_void, vtype 0, vtype 1), bits_of_elt elt
+-
+-let store_3 shape elt =
+- let vtype = type_for_elt shape elt in
+- Arity3 (T_void, vtype 0, vtype 1, vtype 2), bits_of_elt elt
+-
+-let make_notype func shape elt =
+- let arity, _ = func shape elt in
+- arity, NoElts
+-
+-let notype_1 = make_notype elts_same_1
+-let notype_2 = make_notype elts_same_2
+-let notype_3 = make_notype elts_same_3
+-
+-(* Bit-select operations (first operand is unsigned int). *)
+-
+-let bit_select shape elt =
+- let vtype = type_for_elt shape elt
+- and itype = type_for_elt shape (unsigned_of_elt elt) in
+- Arity3 (vtype 0, itype 1, vtype 2, vtype 3), NoElts
+-
+-(* Common lists of supported element types. *)
+-
+-let s_8_32 = [S8; S16; S32]
+-let u_8_32 = [U8; U16; U32]
+-let su_8_32 = [S8; S16; S32; U8; U16; U32]
+-let su_8_64 = S64 :: U64 :: su_8_32
+-let su_16_64 = [S16; S32; S64; U16; U32; U64]
+-let pf_su_8_16 = [P8; P16; S8; S16; U8; U16]
+-let pf_su_8_32 = P8 :: P16 :: F32 :: su_8_32
+-let pf_su_8_64 = P8 :: P16 :: F32 :: su_8_64
+-let suf_32 = [S32; U32; F32]
+-
+-let ops =
+- [
+- (* Addition. *)
+- Vadd, [], All (3, Dreg), "vadd", sign_invar_2, F32 :: su_8_32;
+- Vadd, [No_op], All (3, Dreg), "vadd", sign_invar_2, [S64; U64];
+- Vadd, [], All (3, Qreg), "vaddQ", sign_invar_2, F32 :: su_8_64;
+- Vadd, [], Long, "vaddl", elts_same_2, su_8_32;
+- Vadd, [], Wide, "vaddw", elts_same_2, su_8_32;
+- Vadd, [Halving], All (3, Dreg), "vhadd", elts_same_2, su_8_32;
+- Vadd, [Halving], All (3, Qreg), "vhaddQ", elts_same_2, su_8_32;
+- Vadd, [Instruction_name ["vrhadd"]; Rounding; Halving],
+- All (3, Dreg), "vRhadd", elts_same_2, su_8_32;
+- Vadd, [Instruction_name ["vrhadd"]; Rounding; Halving],
+- All (3, Qreg), "vRhaddQ", elts_same_2, su_8_32;
+- Vadd, [Saturating], All (3, Dreg), "vqadd", elts_same_2, su_8_64;
+- Vadd, [Saturating], All (3, Qreg), "vqaddQ", elts_same_2, su_8_64;
+- Vadd, [High_half], Narrow, "vaddhn", sign_invar_2, su_16_64;
+- Vadd, [Instruction_name ["vraddhn"]; Rounding; High_half],
+- Narrow, "vRaddhn", sign_invar_2, su_16_64;
+-
+- (* Multiplication. *)
+- Vmul, [], All (3, Dreg), "vmul", sign_invar_2, P8 :: F32 :: su_8_32;
+- Vmul, [], All (3, Qreg), "vmulQ", sign_invar_2, P8 :: F32 :: su_8_32;
+- Vmul, [Saturating; Doubling; High_half], All (3, Dreg), "vqdmulh",
+- elts_same_2, [S16; S32];
+- Vmul, [Saturating; Doubling; High_half], All (3, Qreg), "vqdmulhQ",
+- elts_same_2, [S16; S32];
+- Vmul,
+- [Saturating; Rounding; Doubling; High_half;
+- Instruction_name ["vqrdmulh"]],
+- All (3, Dreg), "vqRdmulh",
+- elts_same_2, [S16; S32];
+- Vmul,
+- [Saturating; Rounding; Doubling; High_half;
+- Instruction_name ["vqrdmulh"]],
+- All (3, Qreg), "vqRdmulhQ",
+- elts_same_2, [S16; S32];
+- Vmul, [], Long, "vmull", elts_same_2, P8 :: su_8_32;
+- Vmul, [Saturating; Doubling], Long, "vqdmull", elts_same_2, [S16; S32];
+-
+- (* Multiply-accumulate. *)
+- Vmla, [], All (3, Dreg), "vmla", sign_invar_io, F32 :: su_8_32;
+- Vmla, [], All (3, Qreg), "vmlaQ", sign_invar_io, F32 :: su_8_32;
+- Vmla, [], Long, "vmlal", elts_same_io, su_8_32;
+- Vmla, [Saturating; Doubling], Long, "vqdmlal", elts_same_io, [S16; S32];
+-
+- (* Multiply-subtract. *)
+- Vmls, [], All (3, Dreg), "vmls", sign_invar_io, F32 :: su_8_32;
+- Vmls, [], All (3, Qreg), "vmlsQ", sign_invar_io, F32 :: su_8_32;
+- Vmls, [], Long, "vmlsl", elts_same_io, su_8_32;
+- Vmls, [Saturating; Doubling], Long, "vqdmlsl", elts_same_io, [S16; S32];
+-
+- (* Fused-multiply-accumulate. *)
+- Vfma, [Requires_feature "FMA"], All (3, Dreg), "vfma", elts_same_io, [F32];
+- Vfma, [Requires_feature "FMA"], All (3, Qreg), "vfmaQ", elts_same_io, [F32];
+- Vfms, [Requires_feature "FMA"], All (3, Dreg), "vfms", elts_same_io, [F32];
+- Vfms, [Requires_feature "FMA"], All (3, Qreg), "vfmsQ", elts_same_io, [F32];
+-
+- (* Round to integral. *)
+- Vrintn, [Builtin_name "vrintn"; Requires_arch 8], Use_operands [| Dreg; Dreg |],
+- "vrndn", elts_same_1, [F32];
+- Vrintn, [Builtin_name "vrintn"; Requires_arch 8], Use_operands [| Qreg; Qreg |],
+- "vrndqn", elts_same_1, [F32];
+- Vrinta, [Builtin_name "vrinta"; Requires_arch 8], Use_operands [| Dreg; Dreg |],
+- "vrnda", elts_same_1, [F32];
+- Vrinta, [Builtin_name "vrinta"; Requires_arch 8], Use_operands [| Qreg; Qreg |],
+- "vrndqa", elts_same_1, [F32];
+- Vrintp, [Builtin_name "vrintp"; Requires_arch 8], Use_operands [| Dreg; Dreg |],
+- "vrndp", elts_same_1, [F32];
+- Vrintp, [Builtin_name "vrintp"; Requires_arch 8], Use_operands [| Qreg; Qreg |],
+- "vrndqp", elts_same_1, [F32];
+- Vrintm, [Builtin_name "vrintm"; Requires_arch 8], Use_operands [| Dreg; Dreg |],
+- "vrndm", elts_same_1, [F32];
+- Vrintm, [Builtin_name "vrintm"; Requires_arch 8], Use_operands [| Qreg; Qreg |],
+- "vrndqm", elts_same_1, [F32];
+- Vrintz, [Builtin_name "vrintz"; Requires_arch 8], Use_operands [| Dreg; Dreg |],
+- "vrnd", elts_same_1, [F32];
+- Vrintz, [Builtin_name "vrintz"; Requires_arch 8], Use_operands [| Qreg; Qreg |],
+- "vrndq", elts_same_1, [F32];
+- (* Subtraction. *)
+- Vsub, [], All (3, Dreg), "vsub", sign_invar_2, F32 :: su_8_32;
+- Vsub, [No_op], All (3, Dreg), "vsub", sign_invar_2, [S64; U64];
+- Vsub, [], All (3, Qreg), "vsubQ", sign_invar_2, F32 :: su_8_64;
+- Vsub, [], Long, "vsubl", elts_same_2, su_8_32;
+- Vsub, [], Wide, "vsubw", elts_same_2, su_8_32;
+- Vsub, [Halving], All (3, Dreg), "vhsub", elts_same_2, su_8_32;
+- Vsub, [Halving], All (3, Qreg), "vhsubQ", elts_same_2, su_8_32;
+- Vsub, [Saturating], All (3, Dreg), "vqsub", elts_same_2, su_8_64;
+- Vsub, [Saturating], All (3, Qreg), "vqsubQ", elts_same_2, su_8_64;
+- Vsub, [High_half], Narrow, "vsubhn", sign_invar_2, su_16_64;
+- Vsub, [Instruction_name ["vrsubhn"]; Rounding; High_half],
+- Narrow, "vRsubhn", sign_invar_2, su_16_64;
+-
+- (* Comparison, equal. *)
+- Vceq, [], All (3, Dreg), "vceq", cmp_sign_invar, P8 :: F32 :: su_8_32;
+- Vceq, [], All (3, Qreg), "vceqQ", cmp_sign_invar, P8 :: F32 :: su_8_32;
+-
+- (* Comparison, greater-than or equal. *)
+- Vcge, [], All (3, Dreg), "vcge", cmp_sign_matters, F32 :: s_8_32;
+- Vcge, [Instruction_name ["vcge"]; Builtin_name "vcgeu"],
+- All (3, Dreg), "vcge", cmp_sign_matters,
+- u_8_32;
+- Vcge, [], All (3, Qreg), "vcgeQ", cmp_sign_matters, F32 :: s_8_32;
+- Vcge, [Instruction_name ["vcge"]; Builtin_name "vcgeu"],
+- All (3, Qreg), "vcgeQ", cmp_sign_matters,
+- u_8_32;
+-
+- (* Comparison, less-than or equal. *)
+- Vcle, [Flipped "vcge"], All (3, Dreg), "vcle", cmp_sign_matters,
+- F32 :: s_8_32;
+- Vcle, [Instruction_name ["vcge"]; Flipped "vcgeu"],
+- All (3, Dreg), "vcle", cmp_sign_matters,
+- u_8_32;
+- Vcle, [Instruction_name ["vcge"]; Flipped "vcgeQ"],
+- All (3, Qreg), "vcleQ", cmp_sign_matters,
+- F32 :: s_8_32;
+- Vcle, [Instruction_name ["vcge"]; Flipped "vcgeuQ"],
+- All (3, Qreg), "vcleQ", cmp_sign_matters,
+- u_8_32;
+-
+- (* Comparison, greater-than. *)
+- Vcgt, [], All (3, Dreg), "vcgt", cmp_sign_matters, F32 :: s_8_32;
+- Vcgt, [Instruction_name ["vcgt"]; Builtin_name "vcgtu"],
+- All (3, Dreg), "vcgt", cmp_sign_matters,
+- u_8_32;
+- Vcgt, [], All (3, Qreg), "vcgtQ", cmp_sign_matters, F32 :: s_8_32;
+- Vcgt, [Instruction_name ["vcgt"]; Builtin_name "vcgtu"],
+- All (3, Qreg), "vcgtQ", cmp_sign_matters,
+- u_8_32;
+-
+- (* Comparison, less-than. *)
+- Vclt, [Flipped "vcgt"], All (3, Dreg), "vclt", cmp_sign_matters,
+- F32 :: s_8_32;
+- Vclt, [Instruction_name ["vcgt"]; Flipped "vcgtu"],
+- All (3, Dreg), "vclt", cmp_sign_matters,
+- u_8_32;
+- Vclt, [Instruction_name ["vcgt"]; Flipped "vcgtQ"],
+- All (3, Qreg), "vcltQ", cmp_sign_matters,
+- F32 :: s_8_32;
+- Vclt, [Instruction_name ["vcgt"]; Flipped "vcgtuQ"],
+- All (3, Qreg), "vcltQ", cmp_sign_matters,
+- u_8_32;
+-
+- (* Compare absolute greater-than or equal. *)
+- Vcage, [Instruction_name ["vacge"]],
+- All (3, Dreg), "vcage", cmp_sign_matters, [F32];
+- Vcage, [Instruction_name ["vacge"]],
+- All (3, Qreg), "vcageQ", cmp_sign_matters, [F32];
+-
+- (* Compare absolute less-than or equal. *)
+- Vcale, [Instruction_name ["vacge"]; Flipped "vcage"],
+- All (3, Dreg), "vcale", cmp_sign_matters, [F32];
+- Vcale, [Instruction_name ["vacge"]; Flipped "vcageQ"],
+- All (3, Qreg), "vcaleQ", cmp_sign_matters, [F32];
+-
+- (* Compare absolute greater-than or equal. *)
+- Vcagt, [Instruction_name ["vacgt"]],
+- All (3, Dreg), "vcagt", cmp_sign_matters, [F32];
+- Vcagt, [Instruction_name ["vacgt"]],
+- All (3, Qreg), "vcagtQ", cmp_sign_matters, [F32];
+-
+- (* Compare absolute less-than or equal. *)
+- Vcalt, [Instruction_name ["vacgt"]; Flipped "vcagt"],
+- All (3, Dreg), "vcalt", cmp_sign_matters, [F32];
+- Vcalt, [Instruction_name ["vacgt"]; Flipped "vcagtQ"],
+- All (3, Qreg), "vcaltQ", cmp_sign_matters, [F32];
+-
+- (* Test bits. *)
+- Vtst, [], All (3, Dreg), "vtst", cmp_bits, P8 :: su_8_32;
+- Vtst, [], All (3, Qreg), "vtstQ", cmp_bits, P8 :: su_8_32;
+-
+- (* Absolute difference. *)
+- Vabd, [], All (3, Dreg), "vabd", elts_same_2, F32 :: su_8_32;
+- Vabd, [], All (3, Qreg), "vabdQ", elts_same_2, F32 :: su_8_32;
+- Vabd, [], Long, "vabdl", elts_same_2, su_8_32;
+-
+- (* Absolute difference and accumulate. *)
+- Vaba, [], All (3, Dreg), "vaba", elts_same_io, su_8_32;
+- Vaba, [], All (3, Qreg), "vabaQ", elts_same_io, su_8_32;
+- Vaba, [], Long, "vabal", elts_same_io, su_8_32;
+-
+- (* Max. *)
+- Vmax, [], All (3, Dreg), "vmax", elts_same_2, F32 :: su_8_32;
+- Vmax, [], All (3, Qreg), "vmaxQ", elts_same_2, F32 :: su_8_32;
+-
+- (* Min. *)
+- Vmin, [], All (3, Dreg), "vmin", elts_same_2, F32 :: su_8_32;
+- Vmin, [], All (3, Qreg), "vminQ", elts_same_2, F32 :: su_8_32;
+-
+- (* Pairwise add. *)
+- Vpadd, [], All (3, Dreg), "vpadd", sign_invar_2, F32 :: su_8_32;
+- Vpadd, [], Long_noreg Dreg, "vpaddl", elts_same_1, su_8_32;
+- Vpadd, [], Long_noreg Qreg, "vpaddlQ", elts_same_1, su_8_32;
+-
+- (* Pairwise add, widen and accumulate. *)
+- Vpada, [], Wide_noreg Dreg, "vpadal", elts_same_2, su_8_32;
+- Vpada, [], Wide_noreg Qreg, "vpadalQ", elts_same_2, su_8_32;
+-
+- (* Folding maximum, minimum. *)
+- Vpmax, [], All (3, Dreg), "vpmax", elts_same_2, F32 :: su_8_32;
+- Vpmin, [], All (3, Dreg), "vpmin", elts_same_2, F32 :: su_8_32;
+-
+- (* Reciprocal step. *)
+- Vrecps, [], All (3, Dreg), "vrecps", elts_same_2, [F32];
+- Vrecps, [], All (3, Qreg), "vrecpsQ", elts_same_2, [F32];
+- Vrsqrts, [], All (3, Dreg), "vrsqrts", elts_same_2, [F32];
+- Vrsqrts, [], All (3, Qreg), "vrsqrtsQ", elts_same_2, [F32];
+-
+- (* Vector shift left. *)
+- Vshl, [], All (3, Dreg), "vshl", reg_shift, su_8_64;
+- Vshl, [], All (3, Qreg), "vshlQ", reg_shift, su_8_64;
+- Vshl, [Instruction_name ["vrshl"]; Rounding],
+- All (3, Dreg), "vRshl", reg_shift, su_8_64;
+- Vshl, [Instruction_name ["vrshl"]; Rounding],
+- All (3, Qreg), "vRshlQ", reg_shift, su_8_64;
+- Vshl, [Saturating], All (3, Dreg), "vqshl", reg_shift, su_8_64;
+- Vshl, [Saturating], All (3, Qreg), "vqshlQ", reg_shift, su_8_64;
+- Vshl, [Instruction_name ["vqrshl"]; Saturating; Rounding],
+- All (3, Dreg), "vqRshl", reg_shift, su_8_64;
+- Vshl, [Instruction_name ["vqrshl"]; Saturating; Rounding],
+- All (3, Qreg), "vqRshlQ", reg_shift, su_8_64;
+-
+- (* Vector shift right by constant. *)
+- Vshr_n, [], Binary_imm Dreg, "vshr_n", shift_right, su_8_64;
+- Vshr_n, [], Binary_imm Qreg, "vshrQ_n", shift_right, su_8_64;
+- Vshr_n, [Instruction_name ["vrshr"]; Rounding], Binary_imm Dreg,
+- "vRshr_n", shift_right, su_8_64;
+- Vshr_n, [Instruction_name ["vrshr"]; Rounding], Binary_imm Qreg,
+- "vRshrQ_n", shift_right, su_8_64;
+- Vshr_n, [], Narrow_imm, "vshrn_n", shift_right_sign_invar, su_16_64;
+- Vshr_n, [Instruction_name ["vrshrn"]; Rounding], Narrow_imm, "vRshrn_n",
+- shift_right_sign_invar, su_16_64;
+- Vshr_n, [Saturating], Narrow_imm, "vqshrn_n", shift_right, su_16_64;
+- Vshr_n, [Instruction_name ["vqrshrn"]; Saturating; Rounding], Narrow_imm,
+- "vqRshrn_n", shift_right, su_16_64;
+- Vshr_n, [Saturating; Dst_unsign], Narrow_imm, "vqshrun_n",
+- shift_right_to_uns, [S16; S32; S64];
+- Vshr_n, [Instruction_name ["vqrshrun"]; Saturating; Dst_unsign; Rounding],
+- Narrow_imm, "vqRshrun_n", shift_right_to_uns, [S16; S32; S64];
+-
+- (* Vector shift left by constant. *)
+- Vshl_n, [], Binary_imm Dreg, "vshl_n", shift_left_sign_invar, su_8_64;
+- Vshl_n, [], Binary_imm Qreg, "vshlQ_n", shift_left_sign_invar, su_8_64;
+- Vshl_n, [Saturating], Binary_imm Dreg, "vqshl_n", shift_left, su_8_64;
+- Vshl_n, [Saturating], Binary_imm Qreg, "vqshlQ_n", shift_left, su_8_64;
+- Vshl_n, [Saturating; Dst_unsign], Binary_imm Dreg, "vqshlu_n",
+- shift_left_to_uns, [S8; S16; S32; S64];
+- Vshl_n, [Saturating; Dst_unsign], Binary_imm Qreg, "vqshluQ_n",
+- shift_left_to_uns, [S8; S16; S32; S64];
+- Vshl_n, [], Long_imm, "vshll_n", shift_left, su_8_32;
+-
+- (* Vector shift right by constant and accumulate. *)
+- Vsra_n, [], Binary_imm Dreg, "vsra_n", shift_right_acc, su_8_64;
+- Vsra_n, [], Binary_imm Qreg, "vsraQ_n", shift_right_acc, su_8_64;
+- Vsra_n, [Instruction_name ["vrsra"]; Rounding], Binary_imm Dreg,
+- "vRsra_n", shift_right_acc, su_8_64;
+- Vsra_n, [Instruction_name ["vrsra"]; Rounding], Binary_imm Qreg,
+- "vRsraQ_n", shift_right_acc, su_8_64;
+-
+- (* Vector shift right and insert. *)
+- Vsri, [Requires_feature "CRYPTO"], Use_operands [| Dreg; Dreg; Immed |], "vsri_n", shift_insert,
+- [P64];
+- Vsri, [], Use_operands [| Dreg; Dreg; Immed |], "vsri_n", shift_insert,
+- P8 :: P16 :: su_8_64;
+- Vsri, [Requires_feature "CRYPTO"], Use_operands [| Qreg; Qreg; Immed |], "vsriQ_n", shift_insert,
+- [P64];
+- Vsri, [], Use_operands [| Qreg; Qreg; Immed |], "vsriQ_n", shift_insert,
+- P8 :: P16 :: su_8_64;
+-
+- (* Vector shift left and insert. *)
+- Vsli, [Requires_feature "CRYPTO"], Use_operands [| Dreg; Dreg; Immed |], "vsli_n", shift_insert,
+- [P64];
+- Vsli, [], Use_operands [| Dreg; Dreg; Immed |], "vsli_n", shift_insert,
+- P8 :: P16 :: su_8_64;
+- Vsli, [Requires_feature "CRYPTO"], Use_operands [| Qreg; Qreg; Immed |], "vsliQ_n", shift_insert,
+- [P64];
+- Vsli, [], Use_operands [| Qreg; Qreg; Immed |], "vsliQ_n", shift_insert,
+- P8 :: P16 :: su_8_64;
+-
+- (* Absolute value. *)
+- Vabs, [], All (2, Dreg), "vabs", elts_same_1, [S8; S16; S32; F32];
+- Vabs, [], All (2, Qreg), "vabsQ", elts_same_1, [S8; S16; S32; F32];
+- Vabs, [Saturating], All (2, Dreg), "vqabs", elts_same_1, [S8; S16; S32];
+- Vabs, [Saturating], All (2, Qreg), "vqabsQ", elts_same_1, [S8; S16; S32];
+-
+- (* Negate. *)
+- Vneg, [], All (2, Dreg), "vneg", elts_same_1, [S8; S16; S32; F32];
+- Vneg, [], All (2, Qreg), "vnegQ", elts_same_1, [S8; S16; S32; F32];
+- Vneg, [Saturating], All (2, Dreg), "vqneg", elts_same_1, [S8; S16; S32];
+- Vneg, [Saturating], All (2, Qreg), "vqnegQ", elts_same_1, [S8; S16; S32];
+-
+- (* Bitwise not. *)
+- Vmvn, [], All (2, Dreg), "vmvn", notype_1, P8 :: su_8_32;
+- Vmvn, [], All (2, Qreg), "vmvnQ", notype_1, P8 :: su_8_32;
+-
+- (* Count leading sign bits. *)
+- Vcls, [], All (2, Dreg), "vcls", elts_same_1, [S8; S16; S32];
+- Vcls, [], All (2, Qreg), "vclsQ", elts_same_1, [S8; S16; S32];
+-
+- (* Count leading zeros. *)
+- Vclz, [], All (2, Dreg), "vclz", sign_invar_1, su_8_32;
+- Vclz, [], All (2, Qreg), "vclzQ", sign_invar_1, su_8_32;
+-
+- (* Count number of set bits. *)
+- Vcnt, [], All (2, Dreg), "vcnt", bits_1, [P8; S8; U8];
+- Vcnt, [], All (2, Qreg), "vcntQ", bits_1, [P8; S8; U8];
+-
+- (* Reciprocal estimate. *)
+- Vrecpe, [], All (2, Dreg), "vrecpe", elts_same_1, [U32; F32];
+- Vrecpe, [], All (2, Qreg), "vrecpeQ", elts_same_1, [U32; F32];
+-
+- (* Reciprocal square-root estimate. *)
+- Vrsqrte, [], All (2, Dreg), "vrsqrte", elts_same_1, [U32; F32];
+- Vrsqrte, [], All (2, Qreg), "vrsqrteQ", elts_same_1, [U32; F32];
+-
+- (* Get lanes from a vector. *)
+- Vget_lane,
+- [InfoWord; Disassembles_as [Use_operands [| Corereg; Element_of_dreg |]];
+- Instruction_name ["vmov"]],
+- Use_operands [| Corereg; Dreg; Immed |],
+- "vget_lane", get_lane, pf_su_8_32;
+- Vget_lane,
+- [No_op;
+- InfoWord;
+- Disassembles_as [Use_operands [| Corereg; Corereg; Dreg |]];
+- Instruction_name ["vmov"]; Const_valuator (fun _ -> 0)],
+- Use_operands [| Corereg; Dreg; Immed |],
+- "vget_lane", notype_2, [S64; U64];
+- Vget_lane,
+- [InfoWord; Disassembles_as [Use_operands [| Corereg; Element_of_dreg |]];
+- Instruction_name ["vmov"]],
+- Use_operands [| Corereg; Qreg; Immed |],
+- "vgetQ_lane", get_lane, pf_su_8_32;
+- Vget_lane,
+- [InfoWord;
+- Disassembles_as [Use_operands [| Corereg; Corereg; Dreg |]];
+- Instruction_name ["vmov"; "fmrrd"]; Const_valuator (fun _ -> 0);
+- Fixed_core_reg],
+- Use_operands [| Corereg; Qreg; Immed |],
+- "vgetQ_lane", notype_2, [S64; U64];
+-
+- (* Set lanes in a vector. *)
+- Vset_lane, [Disassembles_as [Use_operands [| Element_of_dreg; Corereg |]];
+- Instruction_name ["vmov"]],
+- Use_operands [| Dreg; Corereg; Dreg; Immed |], "vset_lane",
+- set_lane, pf_su_8_32;
+- Vset_lane, [No_op;
+- Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]];
+- Instruction_name ["vmov"]; Const_valuator (fun _ -> 0)],
+- Use_operands [| Dreg; Corereg; Dreg; Immed |], "vset_lane",
+- set_lane_notype, [S64; U64];
+- Vset_lane, [Disassembles_as [Use_operands [| Element_of_dreg; Corereg |]];
+- Instruction_name ["vmov"]],
+- Use_operands [| Qreg; Corereg; Qreg; Immed |], "vsetQ_lane",
+- set_lane, pf_su_8_32;
+- Vset_lane, [Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]];
+- Instruction_name ["vmov"]; Const_valuator (fun _ -> 0)],
+- Use_operands [| Qreg; Corereg; Qreg; Immed |], "vsetQ_lane",
+- set_lane_notype, [S64; U64];
+-
+- (* Create vector from literal bit pattern. *)
+- Vcreate,
+- [Requires_feature "CRYPTO"; No_op], (* Not really, but it can yield various things that are too
+- hard for the test generator at this time. *)
+- Use_operands [| Dreg; Corereg |], "vcreate", create_vector,
+- [P64];
+- Vcreate,
+- [No_op], (* Not really, but it can yield various things that are too
+- hard for the test generator at this time. *)
+- Use_operands [| Dreg; Corereg |], "vcreate", create_vector,
+- pf_su_8_64;
+-
+- (* Set all lanes to the same value. *)
+- Vdup_n,
+- [Disassembles_as [Use_operands [| Dreg;
+- Alternatives [ Corereg;
+- Element_of_dreg ] |]]],
+- Use_operands [| Dreg; Corereg |], "vdup_n", bits_1,
+- pf_su_8_32;
+- Vdup_n,
+- [No_op; Requires_feature "CRYPTO";
+- Instruction_name ["vmov"];
+- Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]]],
+- Use_operands [| Dreg; Corereg |], "vdup_n", notype_1,
+- [P64];
+- Vdup_n,
+- [No_op;
+- Instruction_name ["vmov"];
+- Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]]],
+- Use_operands [| Dreg; Corereg |], "vdup_n", notype_1,
+- [S64; U64];
+- Vdup_n,
+- [No_op; Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| Qreg;
+- Alternatives [ Corereg;
+- Element_of_dreg ] |]]],
+- Use_operands [| Qreg; Corereg |], "vdupQ_n", bits_1,
+- [P64];
+- Vdup_n,
+- [Disassembles_as [Use_operands [| Qreg;
+- Alternatives [ Corereg;
+- Element_of_dreg ] |]]],
+- Use_operands [| Qreg; Corereg |], "vdupQ_n", bits_1,
+- pf_su_8_32;
+- Vdup_n,
+- [No_op;
+- Instruction_name ["vmov"];
+- Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |];
+- Use_operands [| Dreg; Corereg; Corereg |]]],
+- Use_operands [| Qreg; Corereg |], "vdupQ_n", notype_1,
+- [S64; U64];
+-
+- (* These are just aliases for the above. *)
+- Vmov_n,
+- [Builtin_name "vdup_n";
+- Disassembles_as [Use_operands [| Dreg;
+- Alternatives [ Corereg;
+- Element_of_dreg ] |]]],
+- Use_operands [| Dreg; Corereg |],
+- "vmov_n", bits_1, pf_su_8_32;
+- Vmov_n,
+- [No_op;
+- Builtin_name "vdup_n";
+- Instruction_name ["vmov"];
+- Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]]],
+- Use_operands [| Dreg; Corereg |],
+- "vmov_n", notype_1, [S64; U64];
+- Vmov_n,
+- [Builtin_name "vdupQ_n";
+- Disassembles_as [Use_operands [| Qreg;
+- Alternatives [ Corereg;
+- Element_of_dreg ] |]]],
+- Use_operands [| Qreg; Corereg |],
+- "vmovQ_n", bits_1, pf_su_8_32;
+- Vmov_n,
+- [No_op;
+- Builtin_name "vdupQ_n";
+- Instruction_name ["vmov"];
+- Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |];
+- Use_operands [| Dreg; Corereg; Corereg |]]],
+- Use_operands [| Qreg; Corereg |],
+- "vmovQ_n", notype_1, [S64; U64];
+-
+- (* Duplicate, lane version. We can't use Use_operands here because the
+- rightmost register (always Dreg) would be picked up by find_key_operand,
+- when we want the leftmost register to be used in this case (otherwise
+- the modes are indistinguishable in neon.md, etc. *)
+- Vdup_lane,
+- [Disassembles_as [Use_operands [| Dreg; Element_of_dreg |]]],
+- Unary_scalar Dreg, "vdup_lane", bits_2, pf_su_8_32;
+- Vdup_lane,
+- [No_op; Requires_feature "CRYPTO"; Const_valuator (fun _ -> 0)],
+- Unary_scalar Dreg, "vdup_lane", bits_2, [P64];
+- Vdup_lane,
+- [No_op; Const_valuator (fun _ -> 0)],
+- Unary_scalar Dreg, "vdup_lane", bits_2, [S64; U64];
+- Vdup_lane,
+- [Disassembles_as [Use_operands [| Qreg; Element_of_dreg |]]],
+- Unary_scalar Qreg, "vdupQ_lane", bits_2, pf_su_8_32;
+- Vdup_lane,
+- [No_op; Requires_feature "CRYPTO"; Const_valuator (fun _ -> 0)],
+- Unary_scalar Qreg, "vdupQ_lane", bits_2, [P64];
+- Vdup_lane,
+- [No_op; Const_valuator (fun _ -> 0)],
+- Unary_scalar Qreg, "vdupQ_lane", bits_2, [S64; U64];
+-
+- (* Combining vectors. *)
+- Vcombine, [Requires_feature "CRYPTO"; No_op],
+- Use_operands [| Qreg; Dreg; Dreg |], "vcombine", notype_2,
+- [P64];
+- Vcombine, [No_op],
+- Use_operands [| Qreg; Dreg; Dreg |], "vcombine", notype_2,
+- pf_su_8_64;
+-
+- (* Splitting vectors. *)
+- Vget_high, [Requires_feature "CRYPTO"; No_op],
+- Use_operands [| Dreg; Qreg |], "vget_high",
+- notype_1, [P64];
+- Vget_high, [No_op],
+- Use_operands [| Dreg; Qreg |], "vget_high",
+- notype_1, pf_su_8_64;
+- Vget_low, [Instruction_name ["vmov"];
+- Disassembles_as [Use_operands [| Dreg; Dreg |]];
+- Fixed_vector_reg],
+- Use_operands [| Dreg; Qreg |], "vget_low",
+- notype_1, pf_su_8_32;
+- Vget_low, [Requires_feature "CRYPTO"; No_op],
+- Use_operands [| Dreg; Qreg |], "vget_low",
+- notype_1, [P64];
+- Vget_low, [No_op],
+- Use_operands [| Dreg; Qreg |], "vget_low",
+- notype_1, [S64; U64];
+-
+- (* Conversions. *)
+- Vcvt, [InfoWord], All (2, Dreg), "vcvt", conv_1,
+- [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
+- Vcvt, [InfoWord], All (2, Qreg), "vcvtQ", conv_1,
+- [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
+- Vcvt, [Builtin_name "vcvt" ; Requires_FP_bit 1],
+- Use_operands [| Dreg; Qreg; |], "vcvt", conv_1, [Conv (F16, F32)];
+- Vcvt, [Builtin_name "vcvt" ; Requires_FP_bit 1],
+- Use_operands [| Qreg; Dreg; |], "vcvt", conv_1, [Conv (F32, F16)];
+- Vcvt_n, [InfoWord], Use_operands [| Dreg; Dreg; Immed |], "vcvt_n", conv_2,
+- [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
+- Vcvt_n, [InfoWord], Use_operands [| Qreg; Qreg; Immed |], "vcvtQ_n", conv_2,
+- [Conv (S32, F32); Conv (U32, F32); Conv (F32, S32); Conv (F32, U32)];
+-
+- (* Move, narrowing. *)
+- Vmovn, [Disassembles_as [Use_operands [| Dreg; Qreg |]]],
+- Narrow, "vmovn", sign_invar_1, su_16_64;
+- Vmovn, [Disassembles_as [Use_operands [| Dreg; Qreg |]]; Saturating],
+- Narrow, "vqmovn", elts_same_1, su_16_64;
+- Vmovn,
+- [Disassembles_as [Use_operands [| Dreg; Qreg |]]; Saturating; Dst_unsign],
+- Narrow, "vqmovun", dst_unsign_1,
+- [S16; S32; S64];
+-
+- (* Move, long. *)
+- Vmovl, [Disassembles_as [Use_operands [| Qreg; Dreg |]]],
+- Long, "vmovl", elts_same_1, su_8_32;
+-
+- (* Table lookup. *)
+- Vtbl 1,
+- [Instruction_name ["vtbl"];
+- Disassembles_as [Use_operands [| Dreg; VecArray (1, Dreg); Dreg |]]],
+- Use_operands [| Dreg; Dreg; Dreg |], "vtbl1", table_2, [U8; S8; P8];
+- Vtbl 2, [Instruction_name ["vtbl"]],
+- Use_operands [| Dreg; VecArray (2, Dreg); Dreg |], "vtbl2", table_2,
+- [U8; S8; P8];
+- Vtbl 3, [Instruction_name ["vtbl"]],
+- Use_operands [| Dreg; VecArray (3, Dreg); Dreg |], "vtbl3", table_2,
+- [U8; S8; P8];
+- Vtbl 4, [Instruction_name ["vtbl"]],
+- Use_operands [| Dreg; VecArray (4, Dreg); Dreg |], "vtbl4", table_2,
+- [U8; S8; P8];
+-
+- (* Extended table lookup. *)
+- Vtbx 1,
+- [Instruction_name ["vtbx"];
+- Disassembles_as [Use_operands [| Dreg; VecArray (1, Dreg); Dreg |]]],
+- Use_operands [| Dreg; Dreg; Dreg |], "vtbx1", table_io, [U8; S8; P8];
+- Vtbx 2, [Instruction_name ["vtbx"]],
+- Use_operands [| Dreg; VecArray (2, Dreg); Dreg |], "vtbx2", table_io,
+- [U8; S8; P8];
+- Vtbx 3, [Instruction_name ["vtbx"]],
+- Use_operands [| Dreg; VecArray (3, Dreg); Dreg |], "vtbx3", table_io,
+- [U8; S8; P8];
+- Vtbx 4, [Instruction_name ["vtbx"]],
+- Use_operands [| Dreg; VecArray (4, Dreg); Dreg |], "vtbx4", table_io,
+- [U8; S8; P8];
+-
+- (* Multiply, lane. (note: these were undocumented at the time of
+- writing). *)
+- Vmul_lane, [], By_scalar Dreg, "vmul_lane", sign_invar_2_lane,
+- [S16; S32; U16; U32; F32];
+- Vmul_lane, [], By_scalar Qreg, "vmulQ_lane", sign_invar_2_lane,
+- [S16; S32; U16; U32; F32];
+-
+- (* Multiply-accumulate, lane. *)
+- Vmla_lane, [], By_scalar Dreg, "vmla_lane", sign_invar_io_lane,
+- [S16; S32; U16; U32; F32];
+- Vmla_lane, [], By_scalar Qreg, "vmlaQ_lane", sign_invar_io_lane,
+- [S16; S32; U16; U32; F32];
+- Vmla_lane, [], Wide_lane, "vmlal_lane", elts_same_io_lane,
+- [S16; S32; U16; U32];
+- Vmla_lane, [Saturating; Doubling], Wide_lane, "vqdmlal_lane",
+- elts_same_io_lane, [S16; S32];
+-
+- (* Multiply-subtract, lane. *)
+- Vmls_lane, [], By_scalar Dreg, "vmls_lane", sign_invar_io_lane,
+- [S16; S32; U16; U32; F32];
+- Vmls_lane, [], By_scalar Qreg, "vmlsQ_lane", sign_invar_io_lane,
+- [S16; S32; U16; U32; F32];
+- Vmls_lane, [], Wide_lane, "vmlsl_lane", elts_same_io_lane,
+- [S16; S32; U16; U32];
+- Vmls_lane, [Saturating; Doubling], Wide_lane, "vqdmlsl_lane",
+- elts_same_io_lane, [S16; S32];
+-
+- (* Long multiply, lane. *)
+- Vmull_lane, [],
+- Wide_lane, "vmull_lane", elts_same_2_lane, [S16; S32; U16; U32];
+-
+- (* Saturating doubling long multiply, lane. *)
+- Vqdmull_lane, [Saturating; Doubling],
+- Wide_lane, "vqdmull_lane", elts_same_2_lane, [S16; S32];
+-
+- (* Saturating doubling long multiply high, lane. *)
+- Vqdmulh_lane, [Saturating; Halving],
+- By_scalar Qreg, "vqdmulhQ_lane", elts_same_2_lane, [S16; S32];
+- Vqdmulh_lane, [Saturating; Halving],
+- By_scalar Dreg, "vqdmulh_lane", elts_same_2_lane, [S16; S32];
+- Vqdmulh_lane, [Saturating; Halving; Rounding;
+- Instruction_name ["vqrdmulh"]],
+- By_scalar Qreg, "vqRdmulhQ_lane", elts_same_2_lane, [S16; S32];
+- Vqdmulh_lane, [Saturating; Halving; Rounding;
+- Instruction_name ["vqrdmulh"]],
+- By_scalar Dreg, "vqRdmulh_lane", elts_same_2_lane, [S16; S32];
+-
+- (* Vector multiply by scalar. *)
+- Vmul_n, [InfoWord;
+- Disassembles_as [Use_operands [| Dreg; Dreg; Element_of_dreg |]]],
+- Use_operands [| Dreg; Dreg; Corereg |], "vmul_n",
+- sign_invar_2, [S16; S32; U16; U32; F32];
+- Vmul_n, [InfoWord;
+- Disassembles_as [Use_operands [| Qreg; Qreg; Element_of_dreg |]]],
+- Use_operands [| Qreg; Qreg; Corereg |], "vmulQ_n",
+- sign_invar_2, [S16; S32; U16; U32; F32];
+-
+- (* Vector long multiply by scalar. *)
+- Vmull_n, [Instruction_name ["vmull"];
+- Disassembles_as [Use_operands [| Qreg; Dreg; Element_of_dreg |]]],
+- Wide_scalar, "vmull_n",
+- elts_same_2, [S16; S32; U16; U32];
+-
+- (* Vector saturating doubling long multiply by scalar. *)
+- Vqdmull_n, [Saturating; Doubling;
+- Disassembles_as [Use_operands [| Qreg; Dreg;
+- Element_of_dreg |]]],
+- Wide_scalar, "vqdmull_n",
+- elts_same_2, [S16; S32];
+-
+- (* Vector saturating doubling long multiply high by scalar. *)
+- Vqdmulh_n,
+- [Saturating; Halving; InfoWord;
+- Disassembles_as [Use_operands [| Qreg; Qreg; Element_of_dreg |]]],
+- Use_operands [| Qreg; Qreg; Corereg |],
+- "vqdmulhQ_n", elts_same_2, [S16; S32];
+- Vqdmulh_n,
+- [Saturating; Halving; InfoWord;
+- Disassembles_as [Use_operands [| Dreg; Dreg; Element_of_dreg |]]],
+- Use_operands [| Dreg; Dreg; Corereg |],
+- "vqdmulh_n", elts_same_2, [S16; S32];
+- Vqdmulh_n,
+- [Saturating; Halving; Rounding; InfoWord;
+- Instruction_name ["vqrdmulh"];
+- Disassembles_as [Use_operands [| Qreg; Qreg; Element_of_dreg |]]],
+- Use_operands [| Qreg; Qreg; Corereg |],
+- "vqRdmulhQ_n", elts_same_2, [S16; S32];
+- Vqdmulh_n,
+- [Saturating; Halving; Rounding; InfoWord;
+- Instruction_name ["vqrdmulh"];
+- Disassembles_as [Use_operands [| Dreg; Dreg; Element_of_dreg |]]],
+- Use_operands [| Dreg; Dreg; Corereg |],
+- "vqRdmulh_n", elts_same_2, [S16; S32];
+-
+- (* Vector multiply-accumulate by scalar. *)
+- Vmla_n, [InfoWord;
+- Disassembles_as [Use_operands [| Dreg; Dreg; Element_of_dreg |]]],
+- Use_operands [| Dreg; Dreg; Corereg |], "vmla_n",
+- sign_invar_io, [S16; S32; U16; U32; F32];
+- Vmla_n, [InfoWord;
+- Disassembles_as [Use_operands [| Qreg; Qreg; Element_of_dreg |]]],
+- Use_operands [| Qreg; Qreg; Corereg |], "vmlaQ_n",
+- sign_invar_io, [S16; S32; U16; U32; F32];
+- Vmla_n, [], Wide_scalar, "vmlal_n", elts_same_io, [S16; S32; U16; U32];
+- Vmla_n, [Saturating; Doubling], Wide_scalar, "vqdmlal_n", elts_same_io,
+- [S16; S32];
+-
+- (* Vector multiply subtract by scalar. *)
+- Vmls_n, [InfoWord;
+- Disassembles_as [Use_operands [| Dreg; Dreg; Element_of_dreg |]]],
+- Use_operands [| Dreg; Dreg; Corereg |], "vmls_n",
+- sign_invar_io, [S16; S32; U16; U32; F32];
+- Vmls_n, [InfoWord;
+- Disassembles_as [Use_operands [| Qreg; Qreg; Element_of_dreg |]]],
+- Use_operands [| Qreg; Qreg; Corereg |], "vmlsQ_n",
+- sign_invar_io, [S16; S32; U16; U32; F32];
+- Vmls_n, [], Wide_scalar, "vmlsl_n", elts_same_io, [S16; S32; U16; U32];
+- Vmls_n, [Saturating; Doubling], Wide_scalar, "vqdmlsl_n", elts_same_io,
+- [S16; S32];
+-
+- (* Vector extract. *)
+- Vext, [Requires_feature "CRYPTO"; Const_valuator (fun _ -> 0)],
+- Use_operands [| Dreg; Dreg; Dreg; Immed |], "vext", extend,
+- [P64];
+- Vext, [Const_valuator (fun _ -> 0)],
+- Use_operands [| Dreg; Dreg; Dreg; Immed |], "vext", extend,
+- pf_su_8_64;
+- Vext, [Requires_feature "CRYPTO"; Const_valuator (fun _ -> 0)],
+- Use_operands [| Qreg; Qreg; Qreg; Immed |], "vextQ", extend,
+- [P64];
+- Vext, [Const_valuator (fun _ -> 0)],
+- Use_operands [| Qreg; Qreg; Qreg; Immed |], "vextQ", extend,
+- pf_su_8_64;
+-
+- (* Reverse elements. *)
+- Vrev64, [Use_shuffle (rev_elems 64)], All (2, Dreg), "vrev64", bits_1,
+- P8 :: P16 :: F32 :: su_8_32;
+- Vrev64, [Use_shuffle (rev_elems 64)], All (2, Qreg), "vrev64Q", bits_1,
+- P8 :: P16 :: F32 :: su_8_32;
+- Vrev32, [Use_shuffle (rev_elems 32)], All (2, Dreg), "vrev32", bits_1,
+- [P8; P16; S8; U8; S16; U16];
+- Vrev32, [Use_shuffle (rev_elems 32)], All (2, Qreg), "vrev32Q", bits_1,
+- [P8; P16; S8; U8; S16; U16];
+- Vrev16, [Use_shuffle (rev_elems 16)], All (2, Dreg), "vrev16", bits_1,
+- [P8; S8; U8];
+- Vrev16, [Use_shuffle (rev_elems 16)], All (2, Qreg), "vrev16Q", bits_1,
+- [P8; S8; U8];
+-
+- (* Bit selection. *)
+- Vbsl,
+- [Requires_feature "CRYPTO"; Instruction_name ["vbsl"; "vbit"; "vbif"];
+- Disassembles_as [Use_operands [| Dreg; Dreg; Dreg |]]],
+- Use_operands [| Dreg; Dreg; Dreg; Dreg |], "vbsl", bit_select,
+- [P64];
+- Vbsl,
+- [Instruction_name ["vbsl"; "vbit"; "vbif"];
+- Disassembles_as [Use_operands [| Dreg; Dreg; Dreg |]]],
+- Use_operands [| Dreg; Dreg; Dreg; Dreg |], "vbsl", bit_select,
+- pf_su_8_64;
+- Vbsl,
+- [Requires_feature "CRYPTO"; Instruction_name ["vbsl"; "vbit"; "vbif"];
+- Disassembles_as [Use_operands [| Qreg; Qreg; Qreg |]]],
+- Use_operands [| Qreg; Qreg; Qreg; Qreg |], "vbslQ", bit_select,
+- [P64];
+- Vbsl,
+- [Instruction_name ["vbsl"; "vbit"; "vbif"];
+- Disassembles_as [Use_operands [| Qreg; Qreg; Qreg |]]],
+- Use_operands [| Qreg; Qreg; Qreg; Qreg |], "vbslQ", bit_select,
+- pf_su_8_64;
+-
+- Vtrn, [Use_shuffle trn_elems], Pair_result Dreg, "vtrn", bits_2, pf_su_8_16;
+- Vtrn, [Use_shuffle trn_elems; Instruction_name ["vuzp"]], Pair_result Dreg, "vtrn", bits_2, suf_32;
+- Vtrn, [Use_shuffle trn_elems], Pair_result Qreg, "vtrnQ", bits_2, pf_su_8_32;
+- (* Zip elements. *)
+- Vzip, [Use_shuffle zip_elems], Pair_result Dreg, "vzip", bits_2, pf_su_8_16;
+- Vzip, [Use_shuffle zip_elems; Instruction_name ["vuzp"]], Pair_result Dreg, "vzip", bits_2, suf_32;
+- Vzip, [Use_shuffle zip_elems], Pair_result Qreg, "vzipQ", bits_2, pf_su_8_32;
+-
+- (* Unzip elements. *)
+- Vuzp, [Use_shuffle uzip_elems], Pair_result Dreg, "vuzp", bits_2,
+- pf_su_8_32;
+- Vuzp, [Use_shuffle uzip_elems], Pair_result Qreg, "vuzpQ", bits_2,
+- pf_su_8_32;
+-
+- (* Element/structure loads. VLD1 variants. *)
+- Vldx 1,
+- [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Dreg; CstPtrTo Corereg |], "vld1", bits_1,
+- [P64];
+- Vldx 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Dreg; CstPtrTo Corereg |], "vld1", bits_1,
+- pf_su_8_64;
+- Vldx 1, [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (2, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q", bits_1,
+- [P64];
+- Vldx 1, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q", bits_1,
+- pf_su_8_64;
+-
+- Vldx_lane 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Dreg; CstPtrTo Corereg; Dreg; Immed |],
+- "vld1_lane", bits_3, pf_su_8_32;
+- Vldx_lane 1,
+- [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]];
+- Const_valuator (fun _ -> 0)],
+- Use_operands [| Dreg; CstPtrTo Corereg; Dreg; Immed |],
+- "vld1_lane", bits_3, [P64];
+- Vldx_lane 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]];
+- Const_valuator (fun _ -> 0)],
+- Use_operands [| Dreg; CstPtrTo Corereg; Dreg; Immed |],
+- "vld1_lane", bits_3, [S64; U64];
+- Vldx_lane 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Qreg; CstPtrTo Corereg; Qreg; Immed |],
+- "vld1Q_lane", bits_3, pf_su_8_32;
+- Vldx_lane 1,
+- [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Qreg; CstPtrTo Corereg; Qreg; Immed |],
+- "vld1Q_lane", bits_3, [P64];
+- Vldx_lane 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Qreg; CstPtrTo Corereg; Qreg; Immed |],
+- "vld1Q_lane", bits_3, [S64; U64];
+-
+- Vldx_dup 1,
+- [Disassembles_as [Use_operands [| VecArray (1, All_elements_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Dreg; CstPtrTo Corereg |], "vld1_dup",
+- bits_1, pf_su_8_32;
+- Vldx_dup 1,
+- [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Dreg; CstPtrTo Corereg |], "vld1_dup",
+- bits_1, [P64];
+- Vldx_dup 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Dreg; CstPtrTo Corereg |], "vld1_dup",
+- bits_1, [S64; U64];
+- Vldx_dup 1,
+- [Disassembles_as [Use_operands [| VecArray (2, All_elements_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q_dup",
+- bits_1, pf_su_8_32;
+- (* Treated identically to vld1_dup above as we now
+- do a single load followed by a duplicate. *)
+- Vldx_dup 1,
+- [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q_dup",
+- bits_1, [P64];
+- Vldx_dup 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q_dup",
+- bits_1, [S64; U64];
+-
+- (* VST1 variants. *)
+- Vstx 1, [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- PtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; Dreg |], "vst1",
+- store_1, [P64];
+- Vstx 1, [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- PtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; Dreg |], "vst1",
+- store_1, pf_su_8_64;
+- Vstx 1, [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (2, Dreg);
+- PtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; Qreg |], "vst1Q",
+- store_1, [P64];
+- Vstx 1, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+- PtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; Qreg |], "vst1Q",
+- store_1, pf_su_8_64;
+-
+- Vstx_lane 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; Dreg; Immed |],
+- "vst1_lane", store_3, pf_su_8_32;
+- Vstx_lane 1,
+- [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]];
+- Const_valuator (fun _ -> 0)],
+- Use_operands [| PtrTo Corereg; Dreg; Immed |],
+- "vst1_lane", store_3, [P64];
+- Vstx_lane 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]];
+- Const_valuator (fun _ -> 0)],
+- Use_operands [| PtrTo Corereg; Dreg; Immed |],
+- "vst1_lane", store_3, [U64; S64];
+- Vstx_lane 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; Qreg; Immed |],
+- "vst1Q_lane", store_3, pf_su_8_32;
+- Vstx_lane 1,
+- [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; Qreg; Immed |],
+- "vst1Q_lane", store_3, [P64];
+- Vstx_lane 1,
+- [Disassembles_as [Use_operands [| VecArray (1, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; Qreg; Immed |],
+- "vst1Q_lane", store_3, [U64; S64];
+-
+- (* VLD2 variants. *)
+- Vldx 2, [], Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+- "vld2", bits_1, pf_su_8_32;
+- Vldx 2, [Requires_feature "CRYPTO"; Instruction_name ["vld1"]],
+- Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+- "vld2", bits_1, [P64];
+- Vldx 2, [Instruction_name ["vld1"]],
+- Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+- "vld2", bits_1, [S64; U64];
+- Vldx 2, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+- CstPtrTo Corereg |];
+- Use_operands [| VecArray (2, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (2, Qreg); CstPtrTo Corereg |],
+- "vld2Q", bits_1, pf_su_8_32;
+-
+- Vldx_lane 2,
+- [Disassembles_as [Use_operands
+- [| VecArray (2, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg;
+- VecArray (2, Dreg); Immed |],
+- "vld2_lane", bits_3, P8 :: P16 :: F32 :: su_8_32;
+- Vldx_lane 2,
+- [Disassembles_as [Use_operands
+- [| VecArray (2, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (2, Qreg); CstPtrTo Corereg;
+- VecArray (2, Qreg); Immed |],
+- "vld2Q_lane", bits_3, [P16; F32; U16; U32; S16; S32];
+-
+- Vldx_dup 2,
+- [Disassembles_as [Use_operands
+- [| VecArray (2, All_elements_of_dreg); CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+- "vld2_dup", bits_1, pf_su_8_32;
+- Vldx_dup 2,
+- [Requires_feature "CRYPTO";
+- Instruction_name ["vld1"]; Disassembles_as [Use_operands
+- [| VecArray (2, Dreg); CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+- "vld2_dup", bits_1, [P64];
+- Vldx_dup 2,
+- [Instruction_name ["vld1"]; Disassembles_as [Use_operands
+- [| VecArray (2, Dreg); CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+- "vld2_dup", bits_1, [S64; U64];
+-
+- (* VST2 variants. *)
+- Vstx 2, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+- PtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (2, Dreg) |], "vst2",
+- store_1, pf_su_8_32;
+- Vstx 2, [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (2, Dreg);
+- PtrTo Corereg |]];
+- Instruction_name ["vst1"]],
+- Use_operands [| PtrTo Corereg; VecArray (2, Dreg) |], "vst2",
+- store_1, [P64];
+- Vstx 2, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+- PtrTo Corereg |]];
+- Instruction_name ["vst1"]],
+- Use_operands [| PtrTo Corereg; VecArray (2, Dreg) |], "vst2",
+- store_1, [S64; U64];
+- Vstx 2, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
+- PtrTo Corereg |];
+- Use_operands [| VecArray (2, Dreg);
+- PtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (2, Qreg) |], "vst2Q",
+- store_1, pf_su_8_32;
+-
+- Vstx_lane 2,
+- [Disassembles_as [Use_operands
+- [| VecArray (2, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (2, Dreg); Immed |], "vst2_lane",
+- store_3, P8 :: P16 :: F32 :: su_8_32;
+- Vstx_lane 2,
+- [Disassembles_as [Use_operands
+- [| VecArray (2, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (2, Qreg); Immed |], "vst2Q_lane",
+- store_3, [P16; F32; U16; U32; S16; S32];
+-
+- (* VLD3 variants. *)
+- Vldx 3, [], Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+- "vld3", bits_1, pf_su_8_32;
+- Vldx 3, [Requires_feature "CRYPTO"; Instruction_name ["vld1"]],
+- Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+- "vld3", bits_1, [P64];
+- Vldx 3, [Instruction_name ["vld1"]],
+- Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+- "vld3", bits_1, [S64; U64];
+- Vldx 3, [Disassembles_as [Use_operands [| VecArray (3, Dreg);
+- CstPtrTo Corereg |];
+- Use_operands [| VecArray (3, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (3, Qreg); CstPtrTo Corereg |],
+- "vld3Q", bits_1, P8 :: P16 :: F32 :: su_8_32;
+-
+- Vldx_lane 3,
+- [Disassembles_as [Use_operands
+- [| VecArray (3, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg;
+- VecArray (3, Dreg); Immed |],
+- "vld3_lane", bits_3, P8 :: P16 :: F32 :: su_8_32;
+- Vldx_lane 3,
+- [Disassembles_as [Use_operands
+- [| VecArray (3, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (3, Qreg); CstPtrTo Corereg;
+- VecArray (3, Qreg); Immed |],
+- "vld3Q_lane", bits_3, [P16; F32; U16; U32; S16; S32];
+-
+- Vldx_dup 3,
+- [Disassembles_as [Use_operands
+- [| VecArray (3, All_elements_of_dreg); CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+- "vld3_dup", bits_1, pf_su_8_32;
+- Vldx_dup 3,
+- [Requires_feature "CRYPTO";
+- Instruction_name ["vld1"]; Disassembles_as [Use_operands
+- [| VecArray (3, Dreg); CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+- "vld3_dup", bits_1, [P64];
+- Vldx_dup 3,
+- [Instruction_name ["vld1"]; Disassembles_as [Use_operands
+- [| VecArray (3, Dreg); CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+- "vld3_dup", bits_1, [S64; U64];
+-
+- (* VST3 variants. *)
+- Vstx 3, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+- PtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (3, Dreg) |], "vst3",
+- store_1, pf_su_8_32;
+- Vstx 3, [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (4, Dreg);
+- PtrTo Corereg |]];
+- Instruction_name ["vst1"]],
+- Use_operands [| PtrTo Corereg; VecArray (3, Dreg) |], "vst3",
+- store_1, [P64];
+- Vstx 3, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+- PtrTo Corereg |]];
+- Instruction_name ["vst1"]],
+- Use_operands [| PtrTo Corereg; VecArray (3, Dreg) |], "vst3",
+- store_1, [S64; U64];
+- Vstx 3, [Disassembles_as [Use_operands [| VecArray (3, Dreg);
+- PtrTo Corereg |];
+- Use_operands [| VecArray (3, Dreg);
+- PtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (3, Qreg) |], "vst3Q",
+- store_1, pf_su_8_32;
+-
+- Vstx_lane 3,
+- [Disassembles_as [Use_operands
+- [| VecArray (3, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (3, Dreg); Immed |], "vst3_lane",
+- store_3, P8 :: P16 :: F32 :: su_8_32;
+- Vstx_lane 3,
+- [Disassembles_as [Use_operands
+- [| VecArray (3, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (3, Qreg); Immed |], "vst3Q_lane",
+- store_3, [P16; F32; U16; U32; S16; S32];
+-
+- (* VLD4/VST4 variants. *)
+- Vldx 4, [], Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+- "vld4", bits_1, pf_su_8_32;
+- Vldx 4, [Requires_feature "CRYPTO"; Instruction_name ["vld1"]],
+- Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+- "vld4", bits_1, [P64];
+- Vldx 4, [Instruction_name ["vld1"]],
+- Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+- "vld4", bits_1, [S64; U64];
+- Vldx 4, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+- CstPtrTo Corereg |];
+- Use_operands [| VecArray (4, Dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (4, Qreg); CstPtrTo Corereg |],
+- "vld4Q", bits_1, P8 :: P16 :: F32 :: su_8_32;
+-
+- Vldx_lane 4,
+- [Disassembles_as [Use_operands
+- [| VecArray (4, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg;
+- VecArray (4, Dreg); Immed |],
+- "vld4_lane", bits_3, P8 :: P16 :: F32 :: su_8_32;
+- Vldx_lane 4,
+- [Disassembles_as [Use_operands
+- [| VecArray (4, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (4, Qreg); CstPtrTo Corereg;
+- VecArray (4, Qreg); Immed |],
+- "vld4Q_lane", bits_3, [P16; F32; U16; U32; S16; S32];
+-
+- Vldx_dup 4,
+- [Disassembles_as [Use_operands
+- [| VecArray (4, All_elements_of_dreg); CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+- "vld4_dup", bits_1, pf_su_8_32;
+- Vldx_dup 4,
+- [Requires_feature "CRYPTO";
+- Instruction_name ["vld1"]; Disassembles_as [Use_operands
+- [| VecArray (4, Dreg); CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+- "vld4_dup", bits_1, [P64];
+- Vldx_dup 4,
+- [Instruction_name ["vld1"]; Disassembles_as [Use_operands
+- [| VecArray (4, Dreg); CstPtrTo Corereg |]]],
+- Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+- "vld4_dup", bits_1, [S64; U64];
+-
+- Vstx 4, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+- PtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (4, Dreg) |], "vst4",
+- store_1, pf_su_8_32;
+- Vstx 4, [Requires_feature "CRYPTO";
+- Disassembles_as [Use_operands [| VecArray (4, Dreg);
+- PtrTo Corereg |]];
+- Instruction_name ["vst1"]],
+- Use_operands [| PtrTo Corereg; VecArray (4, Dreg) |], "vst4",
+- store_1, [P64];
+- Vstx 4, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+- PtrTo Corereg |]];
+- Instruction_name ["vst1"]],
+- Use_operands [| PtrTo Corereg; VecArray (4, Dreg) |], "vst4",
+- store_1, [S64; U64];
+- Vstx 4, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
+- PtrTo Corereg |];
+- Use_operands [| VecArray (4, Dreg);
+- PtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (4, Qreg) |], "vst4Q",
+- store_1, pf_su_8_32;
+-
+- Vstx_lane 4,
+- [Disassembles_as [Use_operands
+- [| VecArray (4, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (4, Dreg); Immed |], "vst4_lane",
+- store_3, P8 :: P16 :: F32 :: su_8_32;
+- Vstx_lane 4,
+- [Disassembles_as [Use_operands
+- [| VecArray (4, Element_of_dreg);
+- CstPtrTo Corereg |]]],
+- Use_operands [| PtrTo Corereg; VecArray (4, Qreg); Immed |], "vst4Q_lane",
+- store_3, [P16; F32; U16; U32; S16; S32];
+-
+- (* Logical operations. And. *)
+- Vand, [], All (3, Dreg), "vand", notype_2, su_8_32;
+- Vand, [No_op], All (3, Dreg), "vand", notype_2, [S64; U64];
+- Vand, [], All (3, Qreg), "vandQ", notype_2, su_8_64;
+-
+- (* Or. *)
+- Vorr, [], All (3, Dreg), "vorr", notype_2, su_8_32;
+- Vorr, [No_op], All (3, Dreg), "vorr", notype_2, [S64; U64];
+- Vorr, [], All (3, Qreg), "vorrQ", notype_2, su_8_64;
+-
+- (* Eor. *)
+- Veor, [], All (3, Dreg), "veor", notype_2, su_8_32;
+- Veor, [No_op], All (3, Dreg), "veor", notype_2, [S64; U64];
+- Veor, [], All (3, Qreg), "veorQ", notype_2, su_8_64;
+-
+- (* Bic (And-not). *)
+- Vbic, [Compiler_optim "-O2"], All (3, Dreg), "vbic", notype_2, su_8_32;
+- Vbic, [No_op; Compiler_optim "-O2"], All (3, Dreg), "vbic", notype_2, [S64; U64];
+- Vbic, [Compiler_optim "-O2"], All (3, Qreg), "vbicQ", notype_2, su_8_64;
+-
+- (* Or-not. *)
+- Vorn, [Compiler_optim "-O2"], All (3, Dreg), "vorn", notype_2, su_8_32;
+- Vorn, [No_op; Compiler_optim "-O2"], All (3, Dreg), "vorn", notype_2, [S64; U64];
+- Vorn, [Compiler_optim "-O2"], All (3, Qreg), "vornQ", notype_2, su_8_64;
+- ]
+-
+-let type_in_crypto_only t
+- = (t == P64) || (t == P128)
+-
+-let cross_product s1 s2
+- = List.filter (fun (e, e') -> e <> e')
+- (List.concat (List.map (fun e1 -> List.map (fun e2 -> (e1,e2)) s1) s2))
+-
+-let reinterp =
+- let elems = P8 :: P16 :: F32 :: P64 :: su_8_64 in
+- let casts = cross_product elems elems in
+- List.map
+- (fun (convto, convfrom) ->
+- Vreinterp, (if (type_in_crypto_only convto) || (type_in_crypto_only convfrom)
+- then [Requires_feature "CRYPTO"] else []) @ [No_op], Use_operands [| Dreg; Dreg |],
+- "vreinterpret", conv_1, [Cast (convto, convfrom)])
+- casts
+-
+-let reinterpq =
+- let elems = P8 :: P16 :: F32 :: P64 :: P128 :: su_8_64 in
+- let casts = cross_product elems elems in
+- List.map
+- (fun (convto, convfrom) ->
+- Vreinterp, (if (type_in_crypto_only convto) || (type_in_crypto_only convfrom)
+- then [Requires_feature "CRYPTO"] else []) @ [No_op], Use_operands [| Qreg; Qreg |],
+- "vreinterpretQ", conv_1, [Cast (convto, convfrom)])
+- casts
+-
+-(* Output routines. *)
+-
+-let rec string_of_elt = function
+- S8 -> "s8" | S16 -> "s16" | S32 -> "s32" | S64 -> "s64"
+- | U8 -> "u8" | U16 -> "u16" | U32 -> "u32" | U64 -> "u64"
+- | I8 -> "i8" | I16 -> "i16" | I32 -> "i32" | I64 -> "i64"
+- | B8 -> "8" | B16 -> "16" | B32 -> "32" | B64 -> "64"
+- | F16 -> "f16" | F32 -> "f32" | P8 -> "p8" | P16 -> "p16"
+- | P64 -> "p64" | P128 -> "p128"
+- | Conv (a, b) | Cast (a, b) -> string_of_elt a ^ "_" ^ string_of_elt b
+- | NoElts -> failwith "No elts"
+-
+-let string_of_elt_dots elt =
+- match elt with
+- Conv (a, b) | Cast (a, b) -> string_of_elt a ^ "." ^ string_of_elt b
+- | _ -> string_of_elt elt
+-
+-let string_of_vectype vt =
+- let rec name affix = function
+- T_int8x8 -> affix "int8x8"
+- | T_int8x16 -> affix "int8x16"
+- | T_int16x4 -> affix "int16x4"
+- | T_int16x8 -> affix "int16x8"
+- | T_int32x2 -> affix "int32x2"
+- | T_int32x4 -> affix "int32x4"
+- | T_int64x1 -> affix "int64x1"
+- | T_int64x2 -> affix "int64x2"
+- | T_uint8x8 -> affix "uint8x8"
+- | T_uint8x16 -> affix "uint8x16"
+- | T_uint16x4 -> affix "uint16x4"
+- | T_uint16x8 -> affix "uint16x8"
+- | T_uint32x2 -> affix "uint32x2"
+- | T_uint32x4 -> affix "uint32x4"
+- | T_uint64x1 -> affix "uint64x1"
+- | T_uint64x2 -> affix "uint64x2"
+- | T_float16x4 -> affix "float16x4"
+- | T_float32x2 -> affix "float32x2"
+- | T_float32x4 -> affix "float32x4"
+- | T_poly8x8 -> affix "poly8x8"
+- | T_poly8x16 -> affix "poly8x16"
+- | T_poly16x4 -> affix "poly16x4"
+- | T_poly16x8 -> affix "poly16x8"
+- | T_int8 -> affix "int8"
+- | T_int16 -> affix "int16"
+- | T_int32 -> affix "int32"
+- | T_int64 -> affix "int64"
+- | T_uint8 -> affix "uint8"
+- | T_uint16 -> affix "uint16"
+- | T_uint32 -> affix "uint32"
+- | T_uint64 -> affix "uint64"
+- | T_poly8 -> affix "poly8"
+- | T_poly16 -> affix "poly16"
+- | T_poly64 -> affix "poly64"
+- | T_poly64x1 -> affix "poly64x1"
+- | T_poly64x2 -> affix "poly64x2"
+- | T_poly128 -> affix "poly128"
+- | T_float16 -> affix "float16"
+- | T_float32 -> affix "float32"
+- | T_immediate _ -> "const int"
+- | T_void -> "void"
+- | T_intQI -> "__builtin_neon_qi"
+- | T_intHI -> "__builtin_neon_hi"
+- | T_intSI -> "__builtin_neon_si"
+- | T_intDI -> "__builtin_neon_di"
+- | T_intTI -> "__builtin_neon_ti"
+- | T_floatHF -> "__builtin_neon_hf"
+- | T_floatSF -> "__builtin_neon_sf"
+- | T_arrayof (num, base) ->
+- let basename = name (fun x -> x) base in
+- affix (Printf.sprintf "%sx%d" basename num)
+- | T_ptrto x ->
+- let basename = name affix x in
+- Printf.sprintf "%s *" basename
+- | T_const x ->
+- let basename = name affix x in
+- Printf.sprintf "const %s" basename
+- in
+- name (fun x -> x ^ "_t") vt
+-
+-let string_of_inttype = function
+- B_TImode -> "__builtin_neon_ti"
+- | B_EImode -> "__builtin_neon_ei"
+- | B_OImode -> "__builtin_neon_oi"
+- | B_CImode -> "__builtin_neon_ci"
+- | B_XImode -> "__builtin_neon_xi"
+-
+-let string_of_mode = function
+- V8QI -> "v8qi" | V4HI -> "v4hi" | V4HF -> "v4hf" | V2SI -> "v2si"
+- | V2SF -> "v2sf" | DI -> "di" | V16QI -> "v16qi" | V8HI -> "v8hi"
+- | V4SI -> "v4si" | V4SF -> "v4sf" | V2DI -> "v2di" | QI -> "qi"
+- | HI -> "hi" | SI -> "si" | SF -> "sf" | TI -> "ti"
+-
+-(* Use uppercase chars for letters which form part of the intrinsic name, but
+- should be omitted from the builtin name (the info is passed in an extra
+- argument, instead). *)
+-let intrinsic_name name = String.lowercase name
+-
+-(* Allow the name of the builtin to be overridden by things (e.g. Flipped)
+- found in the features list. *)
+-let builtin_name features name =
+- let name = List.fold_right
+- (fun el name ->
+- match el with
+- Flipped x | Builtin_name x -> x
+- | _ -> name)
+- features name in
+- let islower x = let str = String.make 1 x in (String.lowercase str) = str
+- and buf = Buffer.create (String.length name) in
+- String.iter (fun c -> if islower c then Buffer.add_char buf c) name;
+- Buffer.contents buf
+-
+-(* Transform an arity into a list of strings. *)
+-let strings_of_arity a =
+- match a with
+- | Arity0 vt -> [string_of_vectype vt]
+- | Arity1 (vt1, vt2) -> [string_of_vectype vt1; string_of_vectype vt2]
+- | Arity2 (vt1, vt2, vt3) -> [string_of_vectype vt1;
+- string_of_vectype vt2;
+- string_of_vectype vt3]
+- | Arity3 (vt1, vt2, vt3, vt4) -> [string_of_vectype vt1;
+- string_of_vectype vt2;
+- string_of_vectype vt3;
+- string_of_vectype vt4]
+- | Arity4 (vt1, vt2, vt3, vt4, vt5) -> [string_of_vectype vt1;
+- string_of_vectype vt2;
+- string_of_vectype vt3;
+- string_of_vectype vt4;
+- string_of_vectype vt5]
+-
+-(* Suffixes on the end of builtin names that are to be stripped in order
+- to obtain the name used as an instruction. They are only stripped if
+- preceded immediately by an underscore. *)
+-let suffixes_to_strip = [ "n"; "lane"; "dup" ]
+-
+-(* Get the possible names of an instruction corresponding to a "name" from the
+- ops table. This is done by getting the equivalent builtin name and
+- stripping any suffixes from the list at the top of this file, unless
+- the features list presents with an Instruction_name entry, in which
+- case that is used; or unless the features list presents with a Flipped
+- entry, in which case that is used. If both such entries are present,
+- the first in the list will be chosen. *)
+-let get_insn_names features name =
+- let names = try
+- begin
+- match List.find (fun feature -> match feature with
+- Instruction_name _ -> true
+- | Flipped _ -> true
+- | _ -> false) features
+- with
+- Instruction_name names -> names
+- | Flipped name -> [name]
+- | _ -> assert false
+- end
+- with Not_found -> [builtin_name features name]
+- in
+- begin
+- List.map (fun name' ->
+- try
+- let underscore = String.rindex name' '_' in
+- let our_suffix = String.sub name' (underscore + 1)
+- ((String.length name') - underscore - 1)
+- in
+- let rec strip remaining_suffixes =
+- match remaining_suffixes with
+- [] -> name'
+- | s::ss when our_suffix = s -> String.sub name' 0 underscore
+- | _::ss -> strip ss
+- in
+- strip suffixes_to_strip
+- with (Not_found | Invalid_argument _) -> name') names
+- end
+-
+-(* Apply a function to each element of a list and then comma-separate
+- the resulting strings. *)
+-let rec commas f elts acc =
+- match elts with
+- [] -> acc
+- | [elt] -> acc ^ (f elt)
+- | elt::elts ->
+- commas f elts (acc ^ (f elt) ^ ", ")
+-
+-(* Given a list of features and the shape specified in the "ops" table, apply
+- a function to each possible shape that the instruction may have.
+- By default, this is the "shape" entry in "ops". If the features list
+- contains a Disassembles_as entry, the shapes contained in that entry are
+- mapped to corresponding outputs and returned in a list. If there is more
+- than one Disassembles_as entry, only the first is used. *)
+-let analyze_all_shapes features shape f =
+- try
+- match List.find (fun feature ->
+- match feature with Disassembles_as _ -> true
+- | _ -> false)
+- features with
+- Disassembles_as shapes -> List.map f shapes
+- | _ -> assert false
+- with Not_found -> [f shape]
+-
+-(* The crypto intrinsics have unconventional shapes and are not that
+- numerous to be worth the trouble of encoding here. We implement them
+- explicitly here. *)
+-let crypto_intrinsics =
+-"
+-#ifdef __ARM_FEATURE_CRYPTO
+-
+-__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+-vldrq_p128 (poly128_t const * __ptr)
+-{
+-#ifdef __ARM_BIG_ENDIAN
+- poly64_t* __ptmp = (poly64_t*) __ptr;
+- poly64_t __d0 = vld1_p64 (__ptmp);
+- poly64_t __d1 = vld1_p64 (__ptmp + 1);
+- return vreinterpretq_p128_p64 (vcombine_p64 (__d1, __d0));
+-#else
+- return vreinterpretq_p128_p64 (vld1q_p64 ((poly64_t*) __ptr));
+-#endif
+-}
+-
+-__extension__ static __inline void __attribute__ ((__always_inline__))
+-vstrq_p128 (poly128_t * __ptr, poly128_t __val)
+-{
+-#ifdef __ARM_BIG_ENDIAN
+- poly64x2_t __tmp = vreinterpretq_p64_p128 (__val);
+- poly64_t __d0 = vget_high_p64 (__tmp);
+- poly64_t __d1 = vget_low_p64 (__tmp);
+- vst1q_p64 ((poly64_t*) __ptr, vcombine_p64 (__d0, __d1));
+-#else
+- vst1q_p64 ((poly64_t*) __ptr, vreinterpretq_p64_p128 (__val));
+-#endif
+-}
+-
+-/* The vceq_p64 intrinsic does not map to a single instruction.
+- Instead we emulate it by performing a 32-bit variant of the vceq
+- and applying a pairwise min reduction to the result.
+- vceq_u32 will produce two 32-bit halves, each of which will contain either
+- all ones or all zeros depending on whether the corresponding 32-bit
+- halves of the poly64_t were equal. The whole poly64_t values are equal
+- if and only if both halves are equal, i.e. vceq_u32 returns all ones.
+- If the result is all zeroes for any half then the whole result is zeroes.
+- This is what the pairwise min reduction achieves. */
+-
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vceq_p64 (poly64x1_t __a, poly64x1_t __b)
+-{
+- uint32x2_t __t_a = vreinterpret_u32_p64 (__a);
+- uint32x2_t __t_b = vreinterpret_u32_p64 (__b);
+- uint32x2_t __c = vceq_u32 (__t_a, __t_b);
+- uint32x2_t __m = vpmin_u32 (__c, __c);
+- return vreinterpret_u64_u32 (__m);
+-}
+-
+-/* The vtst_p64 intrinsic does not map to a single instruction.
+- We emulate it in way similar to vceq_p64 above but here we do
+- a reduction with max since if any two corresponding bits
+- in the two poly64_t's match, then the whole result must be all ones. */
+-
+-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+-vtst_p64 (poly64x1_t __a, poly64x1_t __b)
+-{
+- uint32x2_t __t_a = vreinterpret_u32_p64 (__a);
+- uint32x2_t __t_b = vreinterpret_u32_p64 (__b);
+- uint32x2_t __c = vtst_u32 (__t_a, __t_b);
+- uint32x2_t __m = vpmax_u32 (__c, __c);
+- return vreinterpret_u64_u32 (__m);
+-}
+-
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vaeseq_u8 (uint8x16_t __data, uint8x16_t __key)
+-{
+- return __builtin_arm_crypto_aese (__data, __key);
+-}
+-
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vaesdq_u8 (uint8x16_t __data, uint8x16_t __key)
+-{
+- return __builtin_arm_crypto_aesd (__data, __key);
+-}
+-
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vaesmcq_u8 (uint8x16_t __data)
+-{
+- return __builtin_arm_crypto_aesmc (__data);
+-}
+-
+-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+-vaesimcq_u8 (uint8x16_t __data)
+-{
+- return __builtin_arm_crypto_aesimc (__data);
+-}
+-
+-__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+-vsha1h_u32 (uint32_t __hash_e)
+-{
+- uint32x4_t __t = vdupq_n_u32 (0);
+- __t = vsetq_lane_u32 (__hash_e, __t, 0);
+- __t = __builtin_arm_crypto_sha1h (__t);
+- return vgetq_lane_u32 (__t, 0);
+-}
+-
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha1cq_u32 (uint32x4_t __hash_abcd, uint32_t __hash_e, uint32x4_t __wk)
+-{
+- uint32x4_t __t = vdupq_n_u32 (0);
+- __t = vsetq_lane_u32 (__hash_e, __t, 0);
+- return __builtin_arm_crypto_sha1c (__hash_abcd, __t, __wk);
+-}
+-
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha1pq_u32 (uint32x4_t __hash_abcd, uint32_t __hash_e, uint32x4_t __wk)
+-{
+- uint32x4_t __t = vdupq_n_u32 (0);
+- __t = vsetq_lane_u32 (__hash_e, __t, 0);
+- return __builtin_arm_crypto_sha1p (__hash_abcd, __t, __wk);
+-}
+-
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha1mq_u32 (uint32x4_t __hash_abcd, uint32_t __hash_e, uint32x4_t __wk)
+-{
+- uint32x4_t __t = vdupq_n_u32 (0);
+- __t = vsetq_lane_u32 (__hash_e, __t, 0);
+- return __builtin_arm_crypto_sha1m (__hash_abcd, __t, __wk);
+-}
+-
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha1su0q_u32 (uint32x4_t __w0_3, uint32x4_t __w4_7, uint32x4_t __w8_11)
+-{
+- return __builtin_arm_crypto_sha1su0 (__w0_3, __w4_7, __w8_11);
+-}
+-
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha1su1q_u32 (uint32x4_t __tw0_3, uint32x4_t __w12_15)
+-{
+- return __builtin_arm_crypto_sha1su1 (__tw0_3, __w12_15);
+-}
+-
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha256hq_u32 (uint32x4_t __hash_abcd, uint32x4_t __hash_efgh, uint32x4_t __wk)
+-{
+- return __builtin_arm_crypto_sha256h (__hash_abcd, __hash_efgh, __wk);
+-}
-
--#define vcvtq_n_f32_u32(a, b) \
-- __extension__ \
-- ({ \
-- uint32x4_t a_ = (a); \
-- float32x4_t result; \
-- __asm__ ("ucvtf %0.4s, %1.4s, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha256h2q_u32 (uint32x4_t __hash_abcd, uint32x4_t __hash_efgh, uint32x4_t __wk)
+-{
+- return __builtin_arm_crypto_sha256h2 (__hash_abcd, __hash_efgh, __wk);
+-}
-
--#define vcvtq_n_f64_s64(a, b) \
-- __extension__ \
-- ({ \
-- int64x2_t a_ = (a); \
-- float64x2_t result; \
-- __asm__ ("scvtf %0.2d, %1.2d, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha256su0q_u32 (uint32x4_t __w0_3, uint32x4_t __w4_7)
+-{
+- return __builtin_arm_crypto_sha256su0 (__w0_3, __w4_7);
+-}
-
--#define vcvtq_n_f64_u64(a, b) \
-- __extension__ \
-- ({ \
-- uint64x2_t a_ = (a); \
-- float64x2_t result; \
-- __asm__ ("ucvtf %0.2d, %1.2d, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+-vsha256su1q_u32 (uint32x4_t __tw0_3, uint32x4_t __w8_11, uint32x4_t __w12_15)
+-{
+- return __builtin_arm_crypto_sha256su1 (__tw0_3, __w8_11, __w12_15);
+-}
-
--#define vcvtq_n_s32_f32(a, b) \
-- __extension__ \
-- ({ \
-- float32x4_t a_ = (a); \
-- int32x4_t result; \
-- __asm__ ("fcvtzs %0.4s, %1.4s, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+-__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+-vmull_p64 (poly64_t __a, poly64_t __b)
+-{
+- return (poly128_t) __builtin_arm_crypto_vmullp64 ((uint64_t) __a, (uint64_t) __b);
+-}
-
--#define vcvtq_n_s64_f64(a, b) \
-- __extension__ \
-- ({ \
-- float64x2_t a_ = (a); \
-- int64x2_t result; \
-- __asm__ ("fcvtzs %0.2d, %1.2d, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+-__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+-vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
+-{
+- poly64_t __t1 = vget_high_p64 (__a);
+- poly64_t __t2 = vget_high_p64 (__b);
-
--#define vcvtq_n_u32_f32(a, b) \
-- __extension__ \
-- ({ \
-- float32x4_t a_ = (a); \
-- uint32x4_t result; \
-- __asm__ ("fcvtzu %0.4s, %1.4s, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+- return (poly128_t) __builtin_arm_crypto_vmullp64 ((uint64_t) __t1, (uint64_t) __t2);
+-}
-
--#define vcvtq_n_u64_f64(a, b) \
-- __extension__ \
-- ({ \
-- float64x2_t a_ = (a); \
-- uint64x2_t result; \
-- __asm__ ("fcvtzu %0.2d, %1.2d, #%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+-#endif
+-"
+--- a/src/gcc/config/arm/predicates.md
++++ b/src/gcc/config/arm/predicates.md
+@@ -141,8 +141,7 @@
+ (match_test "const_ok_for_arm (~INTVAL (op))")))
+
+ (define_predicate "const0_operand"
+- (and (match_code "const_int")
+- (match_test "INTVAL (op) == 0")))
++ (match_test "op == CONST0_RTX (mode)"))
+
+ ;; Something valid on the RHS of an ARM data-processing instruction
+ (define_predicate "arm_rhs_operand"
+@@ -170,8 +169,7 @@
+
+ (define_predicate "const_neon_scalar_shift_amount_operand"
+ (and (match_code "const_int")
+- (match_test "((unsigned HOST_WIDE_INT) INTVAL (op)) <= GET_MODE_BITSIZE (mode)
+- && ((unsigned HOST_WIDE_INT) INTVAL (op)) > 0")))
++ (match_test "IN_RANGE (UINTVAL (op), 1, GET_MODE_BITSIZE (mode))")))
+
+ (define_predicate "ldrd_strd_offset_operand"
+ (and (match_operand 0 "const_int_operand")
+@@ -285,19 +283,19 @@
+ (match_test "power_of_two_operand (XEXP (op, 1), mode)"))
+ (and (match_code "rotate")
+ (match_test "CONST_INT_P (XEXP (op, 1))
+- && ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1))) < 32")))
++ && (UINTVAL (XEXP (op, 1))) < 32")))
+ (and (match_code "ashift,ashiftrt,lshiftrt,rotatert")
+ (match_test "!CONST_INT_P (XEXP (op, 1))
+- || ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1))) < 32")))
++ || (UINTVAL (XEXP (op, 1))) < 32")))
+ (match_test "mode == GET_MODE (op)")))
+
+ (define_special_predicate "shift_nomul_operator"
+ (and (ior (and (match_code "rotate")
+ (match_test "CONST_INT_P (XEXP (op, 1))
+- && ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1))) < 32"))
++ && (UINTVAL (XEXP (op, 1))) < 32"))
+ (and (match_code "ashift,ashiftrt,lshiftrt,rotatert")
+ (match_test "!CONST_INT_P (XEXP (op, 1))
+- || ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1))) < 32")))
++ || (UINTVAL (XEXP (op, 1))) < 32")))
+ (match_test "mode == GET_MODE (op)")))
+
+ ;; True for shift operators which can be used with saturation instructions.
+@@ -306,7 +304,7 @@
+ (match_test "power_of_two_operand (XEXP (op, 1), mode)"))
+ (and (match_code "ashift,ashiftrt")
+ (match_test "CONST_INT_P (XEXP (op, 1))
+- && ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1)) < 32)")))
++ && (UINTVAL (XEXP (op, 1)) < 32)")))
+ (match_test "mode == GET_MODE (op)")))
+
+ ;; True for MULT, to identify which variant of shift_operator is in use.
+@@ -532,7 +530,7 @@
+ (ior (and (match_code "reg,subreg")
+ (match_operand 0 "s_register_operand"))
+ (and (match_code "const_int")
+- (match_test "((unsigned HOST_WIDE_INT) INTVAL (op)) < 256"))))
++ (match_test "(UINTVAL (op)) < 256"))))
+
+ (define_predicate "thumb1_cmpneg_operand"
+ (and (match_code "const_int")
+@@ -612,59 +610,13 @@
+ (define_special_predicate "vect_par_constant_high"
+ (match_code "parallel")
+ {
+- HOST_WIDE_INT count = XVECLEN (op, 0);
+- int i;
+- int base = GET_MODE_NUNITS (mode);
-
--#define vcvts_n_f32_s32(a, b) \
-- __extension__ \
-- ({ \
-- int32_t a_ = (a); \
-- float32_t result; \
-- __asm__ ("scvtf %s0,%s1,%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+- if ((count < 1)
+- || (count != base/2))
+- return false;
+-
+- if (!VECTOR_MODE_P (mode))
+- return false;
-
--#define vcvts_n_f32_u32(a, b) \
-- __extension__ \
-- ({ \
-- uint32_t a_ = (a); \
-- float32_t result; \
-- __asm__ ("ucvtf %s0,%s1,%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+- for (i = 0; i < count; i++)
+- {
+- rtx elt = XVECEXP (op, 0, i);
+- int val;
+-
+- if (!CONST_INT_P (elt))
+- return false;
+-
+- val = INTVAL (elt);
+- if (val != (base/2) + i)
+- return false;
+- }
+- return true;
++ return arm_simd_check_vect_par_cnst_half_p (op, mode, true);
+ })
+
+ (define_special_predicate "vect_par_constant_low"
+ (match_code "parallel")
+ {
+- HOST_WIDE_INT count = XVECLEN (op, 0);
+- int i;
+- int base = GET_MODE_NUNITS (mode);
+-
+- if ((count < 1)
+- || (count != base/2))
+- return false;
+-
+- if (!VECTOR_MODE_P (mode))
+- return false;
+-
+- for (i = 0; i < count; i++)
+- {
+- rtx elt = XVECEXP (op, 0, i);
+- int val;
+-
+- if (!CONST_INT_P (elt))
+- return false;
+-
+- val = INTVAL (elt);
+- if (val != i)
+- return false;
+- }
+- return true;
++ return arm_simd_check_vect_par_cnst_half_p (op, mode, false);
+ })
+
+ (define_predicate "const_double_vcvt_power_of_two_reciprocal"
+--- a/src/gcc/config/arm/sync.md
++++ b/src/gcc/config/arm/sync.md
+@@ -117,7 +117,7 @@
+ [(match_operand:DI 0 "s_register_operand") ;; val out
+ (match_operand:DI 1 "mem_noofs_operand") ;; memory
+ (match_operand:SI 2 "const_int_operand")] ;; model
+- "(TARGET_HAVE_LDREXD || TARGET_HAVE_LPAE || TARGET_HAVE_LDACQ)
++ "(TARGET_HAVE_LDREXD || TARGET_HAVE_LPAE || TARGET_HAVE_LDACQEXD)
+ && ARM_DOUBLEWORD_ALIGN"
+ {
+ memmodel model = memmodel_from_int (INTVAL (operands[2]));
+@@ -125,7 +125,7 @@
+ /* For ARMv8-A we can use an LDAEXD to atomically load two 32-bit registers
+ when acquire or stronger semantics are needed. When the relaxed model is
+ used this can be relaxed to a normal LDRD. */
+- if (TARGET_HAVE_LDACQ)
++ if (TARGET_HAVE_LDACQEXD)
+ {
+ if (is_mm_relaxed (model))
+ emit_insn (gen_arm_atomic_loaddi2_ldrd (operands[0], operands[1]));
+@@ -436,7 +436,7 @@
+ (unspec_volatile:DI
+ [(match_operand:DI 1 "mem_noofs_operand" "Ua")]
+ VUNSPEC_LAX))]
+- "TARGET_HAVE_LDACQ && ARM_DOUBLEWORD_ALIGN"
++ "TARGET_HAVE_LDACQEXD && ARM_DOUBLEWORD_ALIGN"
+ "ldaexd%?\t%0, %H0, %C1"
+ [(set_attr "predicable" "yes")
+ (set_attr "predicable_short_it" "no")])
+@@ -452,14 +452,13 @@
+ {
+ if (<MODE>mode == DImode)
+ {
+- rtx value = operands[2];
+ /* The restrictions on target registers in ARM mode are that the two
+ registers are consecutive and the first one is even; Thumb is
+ actually more flexible, but DI should give us this anyway.
+- Note that the 1st register always gets the lowest word in memory. */
+- gcc_assert ((REGNO (value) & 1) == 0 || TARGET_THUMB2);
+- operands[3] = gen_rtx_REG (SImode, REGNO (value) + 1);
+- return "strexd%?\t%0, %2, %3, %C1";
++ Note that the 1st register always gets the
++ lowest word in memory. */
++ gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
++ return "strexd%?\t%0, %2, %H2, %C1";
+ }
+ return "strex<sync_sfx>%?\t%0, %2, %C1";
+ }
+@@ -473,13 +472,11 @@
+ (unspec_volatile:DI
+ [(match_operand:DI 2 "s_register_operand" "r")]
+ VUNSPEC_SLX))]
+- "TARGET_HAVE_LDACQ && ARM_DOUBLEWORD_ALIGN"
++ "TARGET_HAVE_LDACQEXD && ARM_DOUBLEWORD_ALIGN"
+ {
+- rtx value = operands[2];
+ /* See comment in arm_store_exclusive<mode> above. */
+- gcc_assert ((REGNO (value) & 1) == 0 || TARGET_THUMB2);
+- operands[3] = gen_rtx_REG (SImode, REGNO (value) + 1);
+- return "stlexd%?\t%0, %2, %3, %C1";
++ gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
++ return "stlexd%?\t%0, %2, %H2, %C1";
+ }
+ [(set_attr "predicable" "yes")
+ (set_attr "predicable_short_it" "no")])
+--- a/src/gcc/config/arm/t-aprofile
++++ b/src/gcc/config/arm/t-aprofile
+@@ -49,38 +49,33 @@ MULTILIB_DIRNAMES += fpv3 simdv1 fpv4 simdvfpv4 simdv8
+ MULTILIB_OPTIONS += mfloat-abi=softfp/mfloat-abi=hard
+ MULTILIB_DIRNAMES += softfp hard
+
+-# We don't build no-float libraries with an FPU.
+-MULTILIB_EXCEPTIONS += *mfpu=vfpv3-d16
+-MULTILIB_EXCEPTIONS += *mfpu=neon
+-MULTILIB_EXCEPTIONS += *mfpu=vfpv4-d16
+-MULTILIB_EXCEPTIONS += *mfpu=neon-vfpv4
+-MULTILIB_EXCEPTIONS += *mfpu=neon-fp-armv8
+-
+-# We don't build libraries requiring an FPU at the CPU/Arch/ISA level.
+-MULTILIB_EXCEPTIONS += mfloat-abi=*
+-MULTILIB_EXCEPTIONS += mfpu=*
+-MULTILIB_EXCEPTIONS += mthumb/mfloat-abi=*
+-MULTILIB_EXCEPTIONS += mthumb/mfpu=*
+-MULTILIB_EXCEPTIONS += *march=armv7-a/mfloat-abi=*
+-MULTILIB_EXCEPTIONS += *march=armv7ve/mfloat-abi=*
+-MULTILIB_EXCEPTIONS += *march=armv8-a/mfloat-abi=*
+-
+-# Ensure the correct FPU variants apply to the correct base architectures.
+-MULTILIB_EXCEPTIONS += *march=armv7ve/*mfpu=vfpv3-d16*
+-MULTILIB_EXCEPTIONS += *march=armv7ve/*mfpu=neon/*
+-MULTILIB_EXCEPTIONS += *march=armv8-a/*mfpu=vfpv3-d16*
+-MULTILIB_EXCEPTIONS += *march=armv8-a/*mfpu=neon/*
+-MULTILIB_EXCEPTIONS += *march=armv7-a/*mfpu=vfpv4-d16*
+-MULTILIB_EXCEPTIONS += *march=armv7-a/*mfpu=neon-vfpv4*
+-MULTILIB_EXCEPTIONS += *march=armv8-a/*mfpu=vfpv4-d16*
+-MULTILIB_EXCEPTIONS += *march=armv8-a/*mfpu=neon-vfpv4*
+-MULTILIB_EXCEPTIONS += *march=armv7-a/*mfpu=neon-fp-armv8*
+-MULTILIB_EXCEPTIONS += *march=armv7ve/*mfpu=neon-fp-armv8*
++
++# Option combinations to build library with
++
++# Default CPU/Arch (ARM is implicitly included because it uses the default
++# multilib)
++MULTILIB_REQUIRED += mthumb
++
++# ARMv7-A
++MULTILIB_REQUIRED += *march=armv7-a
++MULTILIB_REQUIRED += *march=armv7-a/mfpu=vfpv3-d16/mfloat-abi=*
++MULTILIB_REQUIRED += *march=armv7-a/mfpu=neon/mfloat-abi=*
++
++# ARMv7VE
++MULTILIB_REQUIRED += *march=armv7ve
++MULTILIB_REQUIRED += *march=armv7ve/mfpu=vfpv4-d16/mfloat-abi=*
++MULTILIB_REQUIRED += *march=armv7ve/mfpu=neon-vfpv4/mfloat-abi=*
++
++# ARMv8-A
++MULTILIB_REQUIRED += *march=armv8-a
++MULTILIB_REQUIRED += *march=armv8-a/mfpu=neon-fp-armv8/mfloat-abi=*
++
+
+ # CPU Matches
+ MULTILIB_MATCHES += march?armv7-a=mcpu?cortex-a8
+ MULTILIB_MATCHES += march?armv7-a=mcpu?cortex-a9
+ MULTILIB_MATCHES += march?armv7-a=mcpu?cortex-a5
++MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a7
+ MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a15
+ MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a12
+ MULTILIB_MATCHES += march?armv7ve=mcpu?cortex-a17
+@@ -93,6 +88,9 @@ MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a57
+ MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a57.cortex-a53
+ MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a72
+ MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a72.cortex-a53
++MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a73
++MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a73.cortex-a35
++MULTILIB_MATCHES += march?armv8-a=mcpu?cortex-a73.cortex-a53
+ MULTILIB_MATCHES += march?armv8-a=mcpu?exynos-m1
+ MULTILIB_MATCHES += march?armv8-a=mcpu?qdf24xx
+ MULTILIB_MATCHES += march?armv8-a=mcpu?xgene1
+@@ -101,12 +99,17 @@ MULTILIB_MATCHES += march?armv8-a=mcpu?xgene1
+ MULTILIB_MATCHES += march?armv8-a=march?armv8-a+crc
+ MULTILIB_MATCHES += march?armv8-a=march?armv8.1-a
+ MULTILIB_MATCHES += march?armv8-a=march?armv8.1-a+crc
++MULTILIB_MATCHES += march?armv8-a=march?armv8.2-a
++MULTILIB_MATCHES += march?armv8-a=march?armv8.2-a+fp16
+
+ # FPU matches
+ MULTILIB_MATCHES += mfpu?vfpv3-d16=mfpu?vfpv3
+ MULTILIB_MATCHES += mfpu?vfpv3-d16=mfpu?vfpv3-fp16
+-MULTILIB_MATCHES += mfpu?vfpv3-d16=mfpu?vfpv3-fp16-d16
++MULTILIB_MATCHES += mfpu?vfpv3-d16=mfpu?vfpv3-d16-fp16
++MULTILIB_MATCHES += mfpu?neon=mfpu?neon-fp16
+ MULTILIB_MATCHES += mfpu?vfpv4-d16=mfpu?vfpv4
++MULTILIB_MATCHES += mfpu?vfpv4-d16=mfpu?fpv5-d16
++MULTILIB_MATCHES += mfpu?vfpv4-d16=mfpu?fp-armv8
+ MULTILIB_MATCHES += mfpu?neon-fp-armv8=mfpu?crypto-neon-fp-armv8
+
+
+@@ -124,10 +127,6 @@ MULTILIB_REUSE += march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.hard=march.armv8
+ MULTILIB_REUSE += march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.softfp=march.armv8-a/mfpu.vfpv3-d16/mfloat-abi.softfp
+ MULTILIB_REUSE += march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.hard=march.armv7-a/mfpu.vfpv4-d16/mfloat-abi.hard
+ MULTILIB_REUSE += march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.softfp=march.armv7-a/mfpu.vfpv4-d16/mfloat-abi.softfp
+-MULTILIB_REUSE += march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.hard=march.armv7-a/mfpu.fp-armv8/mfloat-abi.hard
+-MULTILIB_REUSE += march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.softfp=march.armv7-a/mfpu.fp-armv8/mfloat-abi.softfp
+-MULTILIB_REUSE += march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.hard=march.armv7-a/mfpu.vfpv4/mfloat-abi.hard
+-MULTILIB_REUSE += march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.softfp=march.armv7-a/mfpu.vfpv4/mfloat-abi.softfp
+
+
+ MULTILIB_REUSE += march.armv7-a/mfpu.neon/mfloat-abi.hard=march.armv7ve/mfpu.neon/mfloat-abi.hard
+@@ -140,10 +139,6 @@ MULTILIB_REUSE += march.armv7-a/mfpu.neon/mfloat-abi.hard=march.armv7-a/mf
+ MULTILIB_REUSE += march.armv7-a/mfpu.neon/mfloat-abi.softfp=march.armv7-a/mfpu.neon-fp-armv8/mfloat-abi.softfp
+
+
+-MULTILIB_REUSE += march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.hard=march.armv7ve/mfpu.fp-armv8/mfloat-abi.hard
+-MULTILIB_REUSE += march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.softfp=march.armv7ve/mfpu.fp-armv8/mfloat-abi.softfp
+-MULTILIB_REUSE += march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.hard=march.armv8-a/mfpu.vfpv4/mfloat-abi.hard
+-MULTILIB_REUSE += march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.softfp=march.armv8-a/mfpu.vfpv4/mfloat-abi.softfp
+ MULTILIB_REUSE += march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.hard=march.armv8-a/mfpu.vfpv4-d16/mfloat-abi.hard
+ MULTILIB_REUSE += march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.softfp=march.armv8-a/mfpu.vfpv4-d16/mfloat-abi.softfp
+
+@@ -163,10 +158,6 @@ MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.hard=mthu
+ MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.softfp=mthumb/march.armv8-a/mfpu.vfpv3-d16/mfloat-abi.softfp
+ MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.hard=mthumb/march.armv7-a/mfpu.vfpv4-d16/mfloat-abi.hard
+ MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.softfp=mthumb/march.armv7-a/mfpu.vfpv4-d16/mfloat-abi.softfp
+-MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.hard=mthumb/march.armv7-a/mfpu.fp-armv8/mfloat-abi.hard
+-MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.softfp=mthumb/march.armv7-a/mfpu.fp-armv8/mfloat-abi.softfp
+-MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.hard=mthumb/march.armv7-a/mfpu.vfpv4/mfloat-abi.hard
+-MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.vfpv3-d16/mfloat-abi.softfp=mthumb/march.armv7-a/mfpu.vfpv4/mfloat-abi.softfp
+
+
+ MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.neon/mfloat-abi.hard=mthumb/march.armv7ve/mfpu.neon/mfloat-abi.hard
+@@ -179,10 +170,6 @@ MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.neon/mfloat-abi.hard=mthumb/ma
+ MULTILIB_REUSE += mthumb/march.armv7-a/mfpu.neon/mfloat-abi.softfp=mthumb/march.armv7-a/mfpu.neon-fp-armv8/mfloat-abi.softfp
+
+
+-MULTILIB_REUSE += mthumb/march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.hard=mthumb/march.armv7ve/mfpu.fp-armv8/mfloat-abi.hard
+-MULTILIB_REUSE += mthumb/march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.softfp=mthumb/march.armv7ve/mfpu.fp-armv8/mfloat-abi.softfp
+-MULTILIB_REUSE += mthumb/march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.hard=mthumb/march.armv8-a/mfpu.vfpv4/mfloat-abi.hard
+-MULTILIB_REUSE += mthumb/march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.softfp=mthumb/march.armv8-a/mfpu.vfpv4/mfloat-abi.softfp
+ MULTILIB_REUSE += mthumb/march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.hard=mthumb/march.armv8-a/mfpu.vfpv4-d16/mfloat-abi.hard
+ MULTILIB_REUSE += mthumb/march.armv7ve/mfpu.vfpv4-d16/mfloat-abi.softfp=mthumb/march.armv8-a/mfpu.vfpv4-d16/mfloat-abi.softfp
+
+--- a/src/gcc/config/arm/t-arm
++++ b/src/gcc/config/arm/t-arm
+@@ -95,7 +95,8 @@ arm.o: $(srcdir)/config/arm/arm.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+ $(srcdir)/config/arm/arm-cores.def \
+ $(srcdir)/config/arm/arm-arches.def $(srcdir)/config/arm/arm-fpus.def \
+ $(srcdir)/config/arm/arm-protos.h \
+- $(srcdir)/config/arm/arm_neon_builtins.def
++ $(srcdir)/config/arm/arm_neon_builtins.def \
++ $(srcdir)/config/arm/arm_vfp_builtins.def
+
+ arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
+ $(SYSTEM_H) coretypes.h $(TM_H) \
+@@ -103,6 +104,7 @@ arm-builtins.o: $(srcdir)/config/arm/arm-builtins.c $(CONFIG_H) \
+ $(DIAGNOSTIC_CORE_H) $(OPTABS_H) \
+ $(srcdir)/config/arm/arm-protos.h \
+ $(srcdir)/config/arm/arm_neon_builtins.def \
++ $(srcdir)/config/arm/arm_vfp_builtins.def \
+ $(srcdir)/config/arm/arm-simd-builtin-types.def
+ $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+ $(srcdir)/config/arm/arm-builtins.c
+--- a/src/gcc/config/arm/thumb1.md
++++ b/src/gcc/config/arm/thumb1.md
+@@ -114,8 +114,8 @@
+ (set (match_dup 0)
+ (plus:SI (match_dup 0) (reg:SI SP_REGNUM)))]
+ "TARGET_THUMB1
+- && (unsigned HOST_WIDE_INT) (INTVAL (operands[1])) < 1024
+- && (INTVAL (operands[1]) & 3) == 0"
++ && UINTVAL (operands[1]) < 1024
++ && (UINTVAL (operands[1]) & 3) == 0"
+ [(set (match_dup 0) (plus:SI (reg:SI SP_REGNUM) (match_dup 1)))]
+ ""
+ )
+@@ -142,11 +142,11 @@
+ (set_attr "type" "alus_sreg")]
+ )
+
+-; Unfortunately with the Thumb the '&'/'0' trick can fails when operands
+-; 1 and 2; are the same, because reload will make operand 0 match
+-; operand 1 without realizing that this conflicts with operand 2. We fix
+-; this by adding another alternative to match this case, and then `reload'
+-; it ourselves. This alternative must come first.
++;; Unfortunately on Thumb the '&'/'0' trick can fail when operands
++;; 1 and 2 are the same, because reload will make operand 0 match
++;; operand 1 without realizing that this conflicts with operand 2. We fix
++;; this by adding another alternative to match this case, and then `reload'
++;; it ourselves. This alternative must come first.
+ (define_insn "*thumb_mulsi3"
+ [(set (match_operand:SI 0 "register_operand" "=&l,&l,&l")
+ (mult:SI (match_operand:SI 1 "register_operand" "%l,*h,0")
+@@ -590,8 +590,8 @@
+ ;;; ??? The 'i' constraint looks funny, but it should always be replaced by
+ ;;; thumb_reorg with a memory reference.
+ (define_insn "*thumb1_movdi_insn"
+- [(set (match_operand:DI 0 "nonimmediate_operand" "=l,l,l,l,>,l, m,*r")
+- (match_operand:DI 1 "general_operand" "l, I,J,>,l,mi,l,*r"))]
++ [(set (match_operand:DI 0 "nonimmediate_operand" "=l,l,l,r,l,>,l, m,*r")
++ (match_operand:DI 1 "general_operand" "l, I,J,j,>,l,mi,l,*r"))]
+ "TARGET_THUMB1
+ && ( register_operand (operands[0], DImode)
+ || register_operand (operands[1], DImode))"
+@@ -610,36 +610,41 @@
+ operands[1] = GEN_INT (- INTVAL (operands[1]));
+ return \"movs\\t%Q0, %1\;rsbs\\t%Q0, %Q0, #0\;asrs\\t%R0, %Q0, #31\";
+ case 3:
+- return \"ldmia\\t%1, {%0, %H0}\";
++ gcc_assert (TARGET_HAVE_MOVT);
++ return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
+ case 4:
+- return \"stmia\\t%0, {%1, %H1}\";
++ return \"ldmia\\t%1, {%0, %H0}\";
+ case 5:
+- return thumb_load_double_from_address (operands);
++ return \"stmia\\t%0, {%1, %H1}\";
+ case 6:
++ return thumb_load_double_from_address (operands);
++ case 7:
+ operands[2] = gen_rtx_MEM (SImode,
+ plus_constant (Pmode, XEXP (operands[0], 0), 4));
+ output_asm_insn (\"str\\t%1, %0\;str\\t%H1, %2\", operands);
+ return \"\";
+- case 7:
++ case 8:
+ if (REGNO (operands[1]) == REGNO (operands[0]) + 1)
+ return \"mov\\t%0, %1\;mov\\t%H0, %H1\";
+ return \"mov\\t%H0, %H1\;mov\\t%0, %1\";
+ }
+ }"
+- [(set_attr "length" "4,4,6,2,2,6,4,4")
+- (set_attr "type" "multiple,multiple,multiple,load2,store2,load2,store2,multiple")
+- (set_attr "pool_range" "*,*,*,*,*,1018,*,*")]
++ [(set_attr "length" "4,4,6,6,2,2,6,4,4")
++ (set_attr "type" "multiple,multiple,multiple,multiple,load2,store2,load2,store2,multiple")
++ (set_attr "arch" "t1,t1,t1,v8mb,t1,t1,t1,t1,t1")
++ (set_attr "pool_range" "*,*,*,*,*,*,1018,*,*")]
+ )
+
+ (define_insn "*thumb1_movsi_insn"
+- [(set (match_operand:SI 0 "nonimmediate_operand" "=l,l,l,l,l,>,l, m,*l*h*k")
+- (match_operand:SI 1 "general_operand" "l, I,J,K,>,l,mi,l,*l*h*k"))]
++ [(set (match_operand:SI 0 "nonimmediate_operand" "=l,l,r,l,l,l,>,l, m,*l*h*k")
++ (match_operand:SI 1 "general_operand" "l, I,j,J,K,>,l,mi,l,*l*h*k"))]
+ "TARGET_THUMB1
+ && ( register_operand (operands[0], SImode)
+ || register_operand (operands[1], SImode))"
+ "@
+ movs %0, %1
+ movs %0, %1
++ movw %0, %1
+ #
+ #
+ ldmia\\t%1, {%0}
+@@ -647,10 +652,11 @@
+ ldr\\t%0, %1
+ str\\t%1, %0
+ mov\\t%0, %1"
+- [(set_attr "length" "2,2,4,4,2,2,2,2,2")
+- (set_attr "type" "mov_reg,mov_imm,multiple,multiple,load1,store1,load1,store1,mov_reg")
+- (set_attr "pool_range" "*,*,*,*,*,*,1018,*,*")
+- (set_attr "conds" "set,clob,*,*,nocond,nocond,nocond,nocond,nocond")])
++ [(set_attr "length" "2,2,4,4,4,2,2,2,2,2")
++ (set_attr "type" "mov_reg,mov_imm,mov_imm,multiple,multiple,load1,store1,load1,store1,mov_reg")
++ (set_attr "pool_range" "*,*,*,*,*,*,*,1018,*,*")
++ (set_attr "arch" "t1,t1,v8mb,t1,t1,t1,t1,t1,t1,t1")
++ (set_attr "conds" "set,clob,nocond,*,*,nocond,nocond,nocond,nocond,nocond")])
+
+ ; Split the load of 64-bit constant into two loads for high and low 32-bit parts respectively
+ ; to see if we can load them in fewer instructions or fewer cycles.
+@@ -687,7 +693,8 @@
+ (define_split
+ [(set (match_operand:SI 0 "register_operand" "")
+ (match_operand:SI 1 "const_int_operand" ""))]
+- "TARGET_THUMB1 && satisfies_constraint_K (operands[1])"
++ "TARGET_THUMB1 && satisfies_constraint_K (operands[1])
++ && !(TARGET_HAVE_MOVT && satisfies_constraint_j (operands[1]))"
+ [(set (match_dup 2) (match_dup 1))
+ (set (match_dup 0) (ashift:SI (match_dup 2) (match_dup 3)))]
+ "
+@@ -714,7 +721,8 @@
+ (define_split
+ [(set (match_operand:SI 0 "register_operand" "")
+ (match_operand:SI 1 "const_int_operand" ""))]
+- "TARGET_THUMB1 && satisfies_constraint_Pe (operands[1])"
++ "TARGET_THUMB1 && satisfies_constraint_Pe (operands[1])
++ && !(TARGET_HAVE_MOVT && satisfies_constraint_j (operands[1]))"
+ [(set (match_dup 2) (match_dup 1))
+ (set (match_dup 0) (plus:SI (match_dup 2) (match_dup 3)))]
+ "
+@@ -726,8 +734,8 @@
+ )
+
+ (define_insn "*thumb1_movhi_insn"
+- [(set (match_operand:HI 0 "nonimmediate_operand" "=l,l,m,l*r,*h,l")
+- (match_operand:HI 1 "general_operand" "l,m,l,k*h,*r,I"))]
++ [(set (match_operand:HI 0 "nonimmediate_operand" "=l,l,m,l*r,*h,l,r")
++ (match_operand:HI 1 "general_operand" "l,m,l,k*h,*r,I,n"))]
+ "TARGET_THUMB1
+ && ( register_operand (operands[0], HImode)
+ || register_operand (operands[1], HImode))"
+@@ -739,6 +747,8 @@
+ case 3: return \"mov %0, %1\";
+ case 4: return \"mov %0, %1\";
+ case 5: return \"movs %0, %1\";
++ case 6: gcc_assert (TARGET_HAVE_MOVT);
++ return \"movw %0, %L1\";
+ default: gcc_unreachable ();
+ case 1:
+ /* The stack pointer can end up being taken as an index register.
+@@ -758,9 +768,10 @@
+ }
+ return \"ldrh %0, %1\";
+ }"
+- [(set_attr "length" "2,4,2,2,2,2")
+- (set_attr "type" "alus_imm,load1,store1,mov_reg,mov_reg,mov_imm")
+- (set_attr "conds" "clob,nocond,nocond,nocond,nocond,clob")])
++ [(set_attr "length" "2,4,2,2,2,2,4")
++ (set_attr "type" "alus_imm,load1,store1,mov_reg,mov_reg,mov_imm,mov_imm")
++ (set_attr "arch" "t1,t1,t1,t1,t1,t1,v8mb")
++ (set_attr "conds" "clob,nocond,nocond,nocond,nocond,clob,nocond")])
+
+ (define_expand "thumb_movhi_clobber"
+ [(set (match_operand:HI 0 "memory_operand" "")
+@@ -963,6 +974,91 @@
+ DONE;
+ })
+
++;; A pattern for the CB(N)Z instruction added in ARMv8-M Baseline profile,
++;; adapted from cbranchsi4_insn. Modifying cbranchsi4_insn instead leads to
++;; code generation difference for ARMv6-M because the minimum length of the
++;; instruction becomes 2 even for ARMv6-M due to a limitation in genattrtab's
++;; handling of PC in the length condition.
++(define_insn "thumb1_cbz"
++ [(set (pc) (if_then_else
++ (match_operator 0 "equality_operator"
++ [(match_operand:SI 1 "s_register_operand" "l")
++ (const_int 0)])
++ (label_ref (match_operand 2 "" ""))
++ (pc)))]
++ "TARGET_THUMB1 && TARGET_HAVE_CBZ"
++{
++ if (get_attr_length (insn) == 2)
++ {
++ if (GET_CODE (operands[0]) == EQ)
++ return "cbz\t%1, %l2";
++ else
++ return "cbnz\t%1, %l2";
++ }
++ else
++ {
++ rtx t = cfun->machine->thumb1_cc_insn;
++ if (t != NULL_RTX)
++ {
++ if (!rtx_equal_p (cfun->machine->thumb1_cc_op0, operands[1])
++ || !rtx_equal_p (cfun->machine->thumb1_cc_op1, operands[2]))
++ t = NULL_RTX;
++ if (cfun->machine->thumb1_cc_mode == CC_NOOVmode)
++ {
++ if (!noov_comparison_operator (operands[0], VOIDmode))
++ t = NULL_RTX;
++ }
++ else if (cfun->machine->thumb1_cc_mode != CCmode)
++ t = NULL_RTX;
++ }
++ if (t == NULL_RTX)
++ {
++ output_asm_insn ("cmp\t%1, #0", operands);
++ cfun->machine->thumb1_cc_insn = insn;
++ cfun->machine->thumb1_cc_op0 = operands[1];
++ cfun->machine->thumb1_cc_op1 = operands[2];
++ cfun->machine->thumb1_cc_mode = CCmode;
++ }
++ else
++ /* Ensure we emit the right type of condition code on the jump. */
++ XEXP (operands[0], 0) = gen_rtx_REG (cfun->machine->thumb1_cc_mode,
++ CC_REGNUM);
++
++ switch (get_attr_length (insn))
++ {
++ case 4: return "b%d0\t%l2";
++ case 6: return "b%D0\t.LCB%=;b\t%l2\t%@long jump\n.LCB%=:";
++ case 8: return "b%D0\t.LCB%=;bl\t%l2\t%@far jump\n.LCB%=:";
++ default: gcc_unreachable ();
++ }
++ }
++}
++ [(set (attr "far_jump")
++ (if_then_else
++ (eq_attr "length" "8")
++ (const_string "yes")
++ (const_string "no")))
++ (set (attr "length")
++ (if_then_else
++ (and (ge (minus (match_dup 2) (pc)) (const_int 2))
++ (le (minus (match_dup 2) (pc)) (const_int 128)))
++ (const_int 2)
++ (if_then_else
++ (and (ge (minus (match_dup 2) (pc)) (const_int -250))
++ (le (minus (match_dup 2) (pc)) (const_int 256)))
++ (const_int 4)
++ (if_then_else
++ (and (ge (minus (match_dup 2) (pc)) (const_int -2040))
++ (le (minus (match_dup 2) (pc)) (const_int 2048)))
++ (const_int 6)
++ (const_int 8)))))
++ (set (attr "type")
++ (if_then_else
++ (eq_attr "length" "2")
++ (const_string "branch")
++ (const_string "multiple")))]
++)
++
+ (define_insn "cbranchsi4_insn"
+ [(set (pc) (if_then_else
+ (match_operator 0 "arm_comparison_operator"
+--- a/src/gcc/config/arm/unspecs.md
++++ b/src/gcc/config/arm/unspecs.md
+@@ -191,6 +191,8 @@
+ UNSPEC_VBSL
+ UNSPEC_VCAGE
+ UNSPEC_VCAGT
++ UNSPEC_VCALE
++ UNSPEC_VCALT
+ UNSPEC_VCEQ
+ UNSPEC_VCGE
+ UNSPEC_VCGEU
+@@ -203,6 +205,20 @@
+ UNSPEC_VCVT_U
+ UNSPEC_VCVT_S_N
+ UNSPEC_VCVT_U_N
++ UNSPEC_VCVT_HF_S_N
++ UNSPEC_VCVT_HF_U_N
++ UNSPEC_VCVT_SI_S_N
++ UNSPEC_VCVT_SI_U_N
++ UNSPEC_VCVTH_S
++ UNSPEC_VCVTH_U
++ UNSPEC_VCVTA_S
++ UNSPEC_VCVTA_U
++ UNSPEC_VCVTM_S
++ UNSPEC_VCVTM_U
++ UNSPEC_VCVTN_S
++ UNSPEC_VCVTN_U
++ UNSPEC_VCVTP_S
++ UNSPEC_VCVTP_U
+ UNSPEC_VEXT
+ UNSPEC_VHADD_S
+ UNSPEC_VHADD_U
+@@ -244,6 +260,8 @@
+ UNSPEC_VMLSL_S_LANE
+ UNSPEC_VMLSL_U_LANE
+ UNSPEC_VMLSL_LANE
++ UNSPEC_VFMA_LANE
++ UNSPEC_VFMS_LANE
+ UNSPEC_VMOVL_S
+ UNSPEC_VMOVL_U
+ UNSPEC_VMOVN
+@@ -365,5 +383,11 @@
+ UNSPEC_NVRINTN
+ UNSPEC_VQRDMLAH
+ UNSPEC_VQRDMLSH
++ UNSPEC_VRND
++ UNSPEC_VRNDA
++ UNSPEC_VRNDI
++ UNSPEC_VRNDM
++ UNSPEC_VRNDN
++ UNSPEC_VRNDP
++ UNSPEC_VRNDX
+ ])
-
--#define vcvts_n_s32_f32(a, b) \
-- __extension__ \
-- ({ \
-- float32_t a_ = (a); \
-- int32_t result; \
-- __asm__ ("fcvtzs %s0,%s1,%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+--- a/src/gcc/config/arm/vec-common.md
++++ b/src/gcc/config/arm/vec-common.md
+@@ -124,6 +124,20 @@
+ FAIL;
+ })
+
++(define_expand "vec_perm_const<mode>"
++ [(match_operand:VH 0 "s_register_operand")
++ (match_operand:VH 1 "s_register_operand")
++ (match_operand:VH 2 "s_register_operand")
++ (match_operand:<V_cmp_result> 3)]
++ "TARGET_NEON"
++{
++ if (arm_expand_vec_perm_const (operands[0], operands[1],
++ operands[2], operands[3]))
++ DONE;
++ else
++ FAIL;
++})
++
+ (define_expand "vec_perm<mode>"
+ [(match_operand:VE 0 "s_register_operand" "")
+ (match_operand:VE 1 "s_register_operand" "")
+--- a/src/gcc/config/arm/vfp.md
++++ b/src/gcc/config/arm/vfp.md
+@@ -18,6 +18,199 @@
+ ;; along with GCC; see the file COPYING3. If not see
+ ;; <http://www.gnu.org/licenses/>. */
+
++;; Patterns for HI moves which provide more data transfer instructions when VFP
++;; support is enabled.
++(define_insn "*arm_movhi_vfp"
++ [(set
++ (match_operand:HI 0 "nonimmediate_operand"
++ "=rk, r, r, m, r, *t, r, *t")
++ (match_operand:HI 1 "general_operand"
++ "rIk, K, n, r, mi, r, *t, *t"))]
++ "TARGET_ARM && TARGET_HARD_FLOAT && TARGET_VFP
++ && !TARGET_VFP_FP16INST
++ && (register_operand (operands[0], HImode)
++ || register_operand (operands[1], HImode))"
++{
++ switch (which_alternative)
++ {
++ case 0:
++ return "mov%?\t%0, %1\t%@ movhi";
++ case 1:
++ return "mvn%?\t%0, #%B1\t%@ movhi";
++ case 2:
++ return "movw%?\t%0, %L1\t%@ movhi";
++ case 3:
++ return "strh%?\t%1, %0\t%@ movhi";
++ case 4:
++ return "ldrh%?\t%0, %1\t%@ movhi";
++ case 5:
++ case 6:
++ return "vmov%?\t%0, %1\t%@ int";
++ case 7:
++ return "vmov%?.f32\t%0, %1\t%@ int";
++ default:
++ gcc_unreachable ();
++ }
++}
++ [(set_attr "predicable" "yes")
++ (set_attr_alternative "type"
++ [(if_then_else
++ (match_operand 1 "const_int_operand" "")
++ (const_string "mov_imm")
++ (const_string "mov_reg"))
++ (const_string "mvn_imm")
++ (const_string "mov_imm")
++ (const_string "store1")
++ (const_string "load1")
++ (const_string "f_mcr")
++ (const_string "f_mrc")
++ (const_string "fmov")])
++ (set_attr "arch" "*, *, v6t2, *, *, *, *, *")
++ (set_attr "pool_range" "*, *, *, *, 256, *, *, *")
++ (set_attr "neg_pool_range" "*, *, *, *, 244, *, *, *")
++ (set_attr "length" "4")]
++)
++
++(define_insn "*thumb2_movhi_vfp"
++ [(set
++ (match_operand:HI 0 "nonimmediate_operand"
++ "=rk, r, l, r, m, r, *t, r, *t")
++ (match_operand:HI 1 "general_operand"
++ "rk, I, Py, n, r, m, r, *t, *t"))]
++ "TARGET_THUMB2 && TARGET_HARD_FLOAT && TARGET_VFP
++ && !TARGET_VFP_FP16INST
++ && (register_operand (operands[0], HImode)
++ || register_operand (operands[1], HImode))"
++{
++ switch (which_alternative)
++ {
++ case 0:
++ case 1:
++ case 2:
++ return "mov%?\t%0, %1\t%@ movhi";
++ case 3:
++ return "movw%?\t%0, %L1\t%@ movhi";
++ case 4:
++ return "strh%?\t%1, %0\t%@ movhi";
++ case 5:
++ return "ldrh%?\t%0, %1\t%@ movhi";
++ case 6:
++ case 7:
++ return "vmov%?\t%0, %1\t%@ int";
++ case 8:
++ return "vmov%?.f32\t%0, %1\t%@ int";
++ default:
++ gcc_unreachable ();
++ }
++}
++ [(set_attr "predicable" "yes")
++ (set_attr "predicable_short_it"
++ "yes, no, yes, no, no, no, no, no, no")
++ (set_attr "type"
++ "mov_reg, mov_imm, mov_imm, mov_imm, store1, load1,\
++ f_mcr, f_mrc, fmov")
++ (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *")
++ (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
++ (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
++ (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
++)
++
++;; Patterns for HI moves which provide more data transfer instructions when FP16
++;; instructions are available.
++(define_insn "*arm_movhi_fp16"
++ [(set
++ (match_operand:HI 0 "nonimmediate_operand"
++ "=r, r, r, m, r, *t, r, *t")
++ (match_operand:HI 1 "general_operand"
++ "rIk, K, n, r, mi, r, *t, *t"))]
++ "TARGET_ARM && TARGET_VFP_FP16INST
++ && (register_operand (operands[0], HImode)
++ || register_operand (operands[1], HImode))"
++{
++ switch (which_alternative)
++ {
++ case 0:
++ return "mov%?\t%0, %1\t%@ movhi";
++ case 1:
++ return "mvn%?\t%0, #%B1\t%@ movhi";
++ case 2:
++ return "movw%?\t%0, %L1\t%@ movhi";
++ case 3:
++ return "strh%?\t%1, %0\t%@ movhi";
++ case 4:
++ return "ldrh%?\t%0, %1\t%@ movhi";
++ case 5:
++ case 6:
++ return "vmov.f16\t%0, %1\t%@ int";
++ case 7:
++ return "vmov%?.f32\t%0, %1\t%@ int";
++ default:
++ gcc_unreachable ();
++ }
++}
++ [(set_attr "predicable" "yes, yes, yes, yes, yes, no, no, yes")
++ (set_attr_alternative "type"
++ [(if_then_else
++ (match_operand 1 "const_int_operand" "")
++ (const_string "mov_imm")
++ (const_string "mov_reg"))
++ (const_string "mvn_imm")
++ (const_string "mov_imm")
++ (const_string "store1")
++ (const_string "load1")
++ (const_string "f_mcr")
++ (const_string "f_mrc")
++ (const_string "fmov")])
++ (set_attr "arch" "*, *, v6t2, *, *, *, *, *")
++ (set_attr "pool_range" "*, *, *, *, 256, *, *, *")
++ (set_attr "neg_pool_range" "*, *, *, *, 244, *, *, *")
++ (set_attr "length" "4")]
++)
++
++(define_insn "*thumb2_movhi_fp16"
++ [(set
++ (match_operand:HI 0 "nonimmediate_operand"
++ "=rk, r, l, r, m, r, *t, r, *t")
++ (match_operand:HI 1 "general_operand"
++ "rk, I, Py, n, r, m, r, *t, *t"))]
++ "TARGET_THUMB2 && TARGET_VFP_FP16INST
++ && (register_operand (operands[0], HImode)
++ || register_operand (operands[1], HImode))"
++{
++ switch (which_alternative)
++ {
++ case 0:
++ case 1:
++ case 2:
++ return "mov%?\t%0, %1\t%@ movhi";
++ case 3:
++ return "movw%?\t%0, %L1\t%@ movhi";
++ case 4:
++ return "strh%?\t%1, %0\t%@ movhi";
++ case 5:
++ return "ldrh%?\t%0, %1\t%@ movhi";
++ case 6:
++ case 7:
++ return "vmov.f16\t%0, %1\t%@ int";
++ case 8:
++ return "vmov%?.f32\t%0, %1\t%@ int";
++ default:
++ gcc_unreachable ();
++ }
++}
++ [(set_attr "predicable"
++ "yes, yes, yes, yes, yes, yes, no, no, yes")
++ (set_attr "predicable_short_it"
++ "yes, no, yes, no, no, no, no, no, no")
++ (set_attr "type"
++ "mov_reg, mov_imm, mov_imm, mov_imm, store1, load1,\
++ f_mcr, f_mrc, fmov")
++ (set_attr "arch" "*, *, *, v6t2, *, *, *, *, *")
++ (set_attr "pool_range" "*, *, *, *, *, 4094, *, *, *")
++ (set_attr "neg_pool_range" "*, *, *, *, *, 250, *, *, *")
++ (set_attr "length" "2, 4, 2, 4, 4, 4, 4, 4, 4")]
++)
++
+ ;; SImode moves
+ ;; ??? For now do not allow loading constants into vfp regs. This causes
+ ;; problems because small constants get converted into adds.
+@@ -53,7 +246,8 @@
+ }
+ "
+ [(set_attr "predicable" "yes")
+- (set_attr "type" "mov_reg,mov_reg,mvn_imm,mov_imm,load1,store1,f_mcr,f_mrc,fmov,f_loads,f_stores")
++ (set_attr "type" "mov_reg,mov_reg,mvn_imm,mov_imm,load1,store1,
++ f_mcr,f_mrc,fmov,f_loads,f_stores")
+ (set_attr "pool_range" "*,*,*,*,4096,*,*,*,*,1020,*")
+ (set_attr "neg_pool_range" "*,*,*,*,4084,*,*,*,*,1008,*")]
+ )
+@@ -211,10 +405,87 @@
+ )
+
+ ;; HFmode moves
++
++(define_insn "*movhf_vfp_fp16"
++ [(set (match_operand:HF 0 "nonimmediate_operand"
++ "= r,m,t,r,t,r,t,t,Um,r")
++ (match_operand:HF 1 "general_operand"
++ " m,r,t,r,r,t,Dv,Um,t,F"))]
++ "TARGET_32BIT
++ && TARGET_VFP_FP16INST
++ && (s_register_operand (operands[0], HFmode)
++ || s_register_operand (operands[1], HFmode))"
++ {
++ switch (which_alternative)
++ {
++ case 0: /* ARM register from memory. */
++ return \"ldrh%?\\t%0, %1\\t%@ __fp16\";
++ case 1: /* Memory from ARM register. */
++ return \"strh%?\\t%1, %0\\t%@ __fp16\";
++ case 2: /* S register from S register. */
++ return \"vmov\\t%0, %1\t%@ __fp16\";
++ case 3: /* ARM register from ARM register. */
++ return \"mov%?\\t%0, %1\\t%@ __fp16\";
++ case 4: /* S register from ARM register. */
++ case 5: /* ARM register from S register. */
++ case 6: /* S register from immediate. */
++ return \"vmov.f16\\t%0, %1\t%@ __fp16\";
++ case 7: /* S register from memory. */
++ return \"vld1.16\\t{%z0}, %A1\";
++ case 8: /* Memory from S register. */
++ return \"vst1.16\\t{%z1}, %A0\";
++ case 9: /* ARM register from constant. */
++ {
++ long bits;
++ rtx ops[4];
++
++ bits = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (operands[1]),
++ HFmode);
++ ops[0] = operands[0];
++ ops[1] = GEN_INT (bits);
++ ops[2] = GEN_INT (bits & 0xff00);
++ ops[3] = GEN_INT (bits & 0x00ff);
++
++ if (arm_arch_thumb2)
++ output_asm_insn (\"movw\\t%0, %1\", ops);
++ else
++ output_asm_insn (\"mov\\t%0, %2\;orr\\t%0, %0, %3\", ops);
++ return \"\";
++ }
++ default:
++ gcc_unreachable ();
++ }
++ }
++ [(set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
++ (set_attr "predicable_short_it" "no, no, no, yes,\
++ no, no, no, no,\
++ no, no")
++ (set_attr_alternative "type"
++ [(const_string "load1") (const_string "store1")
++ (const_string "fmov") (const_string "mov_reg")
++ (const_string "f_mcr") (const_string "f_mrc")
++ (const_string "fconsts") (const_string "neon_load1_1reg")
++ (const_string "neon_store1_1reg")
++ (if_then_else (match_test "arm_arch_thumb2")
++ (const_string "mov_imm")
++ (const_string "multiple"))])
++ (set_attr_alternative "length"
++ [(const_int 4) (const_int 4)
++ (const_int 4) (const_int 4)
++ (const_int 4) (const_int 4)
++ (const_int 4) (const_int 4)
++ (const_int 4)
++ (if_then_else (match_test "arm_arch_thumb2")
++ (const_int 4)
++ (const_int 8))])]
++)
++
+ (define_insn "*movhf_vfp_neon"
+ [(set (match_operand:HF 0 "nonimmediate_operand" "= t,Um,r,m,t,r,t,r,r")
+ (match_operand:HF 1 "general_operand" " Um, t,m,r,t,r,r,t,F"))]
+- "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_NEON_FP16
++ "TARGET_32BIT
++ && TARGET_HARD_FLOAT && TARGET_NEON_FP16
++ && !TARGET_VFP_FP16INST
+ && ( s_register_operand (operands[0], HFmode)
+ || s_register_operand (operands[1], HFmode))"
+ "*
+@@ -268,7 +539,10 @@
+ (define_insn "*movhf_vfp"
+ [(set (match_operand:HF 0 "nonimmediate_operand" "=r,m,t,r,t,r,r")
+ (match_operand:HF 1 "general_operand" " m,r,t,r,r,t,F"))]
+- "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FP16 && !TARGET_NEON_FP16
++ "TARGET_32BIT
++ && TARGET_HARD_FLOAT && TARGET_VFP
++ && !TARGET_NEON_FP16
++ && !TARGET_VFP_FP16INST
+ && ( s_register_operand (operands[0], HFmode)
+ || s_register_operand (operands[1], HFmode))"
+ "*
+@@ -394,8 +668,8 @@
+ ;; DFmode moves
+
+ (define_insn "*movdf_vfp"
+- [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w ,w ,Uv,r, m,w,r")
+- (match_operand:DF 1 "soft_df_operand" " ?r,w,Dy,UvF,w ,mF,r,w,r"))]
++ [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w ,w,w ,Uv,r, m,w,r")
++ (match_operand:DF 1 "soft_df_operand" " ?r,w,Dy,G,UvF,w ,mF,r,w,r"))]
+ "TARGET_ARM && TARGET_HARD_FLOAT && TARGET_VFP
+ && ( register_operand (operands[0], DFmode)
+ || register_operand (operands[1], DFmode))"
+@@ -410,39 +684,43 @@
+ case 2:
+ gcc_assert (TARGET_VFP_DOUBLE);
+ return \"vmov%?.f64\\t%P0, %1\";
+- case 3: case 4:
++ case 3:
++ gcc_assert (TARGET_VFP_DOUBLE);
++ return \"vmov.i64\\t%P0, #0\\t%@ float\";
++ case 4: case 5:
+ return output_move_vfp (operands);
+- case 5: case 6:
++ case 6: case 7:
+ return output_move_double (operands, true, NULL);
+- case 7:
++ case 8:
+ if (TARGET_VFP_SINGLE)
+ return \"vmov%?.f32\\t%0, %1\;vmov%?.f32\\t%p0, %p1\";
+ else
+ return \"vmov%?.f64\\t%P0, %P1\";
+- case 8:
++ case 9:
+ return \"#\";
+ default:
+ gcc_unreachable ();
+ }
+ }
+ "
+- [(set_attr "type" "f_mcrr,f_mrrc,fconstd,f_loadd,f_stored,\
++ [(set_attr "type" "f_mcrr,f_mrrc,fconstd,neon_move,f_loadd,f_stored,\
+ load2,store2,ffarithd,multiple")
+- (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
+- (eq_attr "alternative" "7")
++ (set (attr "length") (cond [(eq_attr "alternative" "6,7,9") (const_int 8)
++ (eq_attr "alternative" "8")
+ (if_then_else
+ (match_test "TARGET_VFP_SINGLE")
+ (const_int 8)
+ (const_int 4))]
+ (const_int 4)))
+- (set_attr "predicable" "yes")
+- (set_attr "pool_range" "*,*,*,1020,*,1020,*,*,*")
+- (set_attr "neg_pool_range" "*,*,*,1004,*,1004,*,*,*")]
++ (set_attr "predicable" "yes,yes,yes,no,yes,yes,yes,yes,yes,yes")
++ (set_attr "pool_range" "*,*,*,*,1020,*,1020,*,*,*")
++ (set_attr "neg_pool_range" "*,*,*,*,1004,*,1004,*,*,*")
++ (set_attr "arch" "any,any,any,neon,any,any,any,any,any,any")]
+ )
+
+ (define_insn "*thumb2_movdf_vfp"
+- [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w ,w ,Uv,r ,m,w,r")
+- (match_operand:DF 1 "soft_df_operand" " ?r,w,Dy,UvF,w, mF,r, w,r"))]
++ [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w ,w,w ,Uv,r ,m,w,r")
++ (match_operand:DF 1 "soft_df_operand" " ?r,w,Dy,G,UvF,w, mF,r, w,r"))]
+ "TARGET_THUMB2 && TARGET_HARD_FLOAT && TARGET_VFP
+ && ( register_operand (operands[0], DFmode)
+ || register_operand (operands[1], DFmode))"
+@@ -457,11 +735,14 @@
+ case 2:
+ gcc_assert (TARGET_VFP_DOUBLE);
+ return \"vmov%?.f64\\t%P0, %1\";
+- case 3: case 4:
++ case 3:
++ gcc_assert (TARGET_VFP_DOUBLE);
++ return \"vmov.i64\\t%P0, #0\\t%@ float\";
++ case 4: case 5:
+ return output_move_vfp (operands);
+- case 5: case 6: case 8:
++ case 6: case 7: case 9:
+ return output_move_double (operands, true, NULL);
+- case 7:
++ case 8:
+ if (TARGET_VFP_SINGLE)
+ return \"vmov%?.f32\\t%0, %1\;vmov%?.f32\\t%p0, %p1\";
+ else
+@@ -471,17 +752,18 @@
+ }
+ }
+ "
+- [(set_attr "type" "f_mcrr,f_mrrc,fconstd,f_loadd,\
++ [(set_attr "type" "f_mcrr,f_mrrc,fconstd,neon_move,f_loadd,\
+ f_stored,load2,store2,ffarithd,multiple")
+- (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
+- (eq_attr "alternative" "7")
++ (set (attr "length") (cond [(eq_attr "alternative" "6,7,9") (const_int 8)
++ (eq_attr "alternative" "8")
+ (if_then_else
+ (match_test "TARGET_VFP_SINGLE")
+ (const_int 8)
+ (const_int 4))]
+ (const_int 4)))
+- (set_attr "pool_range" "*,*,*,1018,*,4094,*,*,*")
+- (set_attr "neg_pool_range" "*,*,*,1008,*,0,*,*,*")]
++ (set_attr "pool_range" "*,*,*,*,1018,*,4094,*,*,*")
++ (set_attr "neg_pool_range" "*,*,*,*,1008,*,0,*,*,*")
++ (set_attr "arch" "any,any,any,neon,any,any,any,any,any,any")]
+ )
+
+
+@@ -661,9 +943,63 @@
+ (set_attr "type" "ffarithd")]
+ )
+
++;; ABS and NEG for FP16.
++(define_insn "<absneg_str>hf2"
++ [(set (match_operand:HF 0 "s_register_operand" "=w")
++ (ABSNEG:HF (match_operand:HF 1 "s_register_operand" "w")))]
++ "TARGET_VFP_FP16INST"
++ "v<absneg_str>.f16\t%0, %1"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "ffariths")]
++)
++
++(define_expand "neon_vabshf"
++ [(set
++ (match_operand:HF 0 "s_register_operand")
++ (abs:HF (match_operand:HF 1 "s_register_operand")))]
++ "TARGET_VFP_FP16INST"
++{
++ emit_insn (gen_abshf2 (operands[0], operands[1]));
++ DONE;
++})
++
++;; VRND for FP16.
++(define_insn "neon_v<fp16_rnd_str>hf"
++ [(set (match_operand:HF 0 "s_register_operand" "=w")
++ (unspec:HF
++ [(match_operand:HF 1 "s_register_operand" "w")]
++ FP16_RND))]
++ "TARGET_VFP_FP16INST"
++ "<fp16_rnd_insn>.f16\t%0, %1"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "neon_fp_round_s")]
++)
++
++(define_insn "neon_vrndihf"
++ [(set (match_operand:HF 0 "s_register_operand" "=w")
++ (unspec:HF
++ [(match_operand:HF 1 "s_register_operand" "w")]
++ UNSPEC_VRNDI))]
++ "TARGET_VFP_FP16INST"
++ "vrintr.f16\t%0, %1"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "neon_fp_round_s")]
++)
+
+ ;; Arithmetic insns
+
++(define_insn "addhf3"
++ [(set
++ (match_operand:HF 0 "s_register_operand" "=w")
++ (plus:HF
++ (match_operand:HF 1 "s_register_operand" "w")
++ (match_operand:HF 2 "s_register_operand" "w")))]
++ "TARGET_VFP_FP16INST"
++ "vadd.f16\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fadds")]
++)
++
+ (define_insn "*addsf3_vfp"
+ [(set (match_operand:SF 0 "s_register_operand" "=t")
+ (plus:SF (match_operand:SF 1 "s_register_operand" "t")
+@@ -686,6 +1022,17 @@
+ (set_attr "type" "faddd")]
+ )
+
++(define_insn "subhf3"
++ [(set
++ (match_operand:HF 0 "s_register_operand" "=w")
++ (minus:HF
++ (match_operand:HF 1 "s_register_operand" "w")
++ (match_operand:HF 2 "s_register_operand" "w")))]
++ "TARGET_VFP_FP16INST"
++ "vsub.f16\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fadds")]
++)
+
+ (define_insn "*subsf3_vfp"
+ [(set (match_operand:SF 0 "s_register_operand" "=t")
+@@ -712,6 +1059,19 @@
+
+ ;; Division insns
+
++;; FP16 Division.
++(define_insn "divhf3"
++ [(set
++ (match_operand:HF 0 "s_register_operand" "=w")
++ (div:HF
++ (match_operand:HF 1 "s_register_operand" "w")
++ (match_operand:HF 2 "s_register_operand" "w")))]
++ "TARGET_VFP_FP16INST"
++ "vdiv.f16\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fdivs")]
++)
++
+ ; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input
+ ; operands, so mark the output as early clobber for VFPv2 on ARMv5 or
+ ; earlier.
+@@ -742,6 +1102,17 @@
+
+ ;; Multiplication insns
+
++(define_insn "mulhf3"
++ [(set
++ (match_operand:HF 0 "s_register_operand" "=w")
++ (mult:HF (match_operand:HF 1 "s_register_operand" "w")
++ (match_operand:HF 2 "s_register_operand" "w")))]
++ "TARGET_VFP_FP16INST"
++ "vmul.f16\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fmuls")]
++)
++
+ (define_insn "*mulsf3_vfp"
+ [(set (match_operand:SF 0 "s_register_operand" "=t")
+ (mult:SF (match_operand:SF 1 "s_register_operand" "t")
+@@ -764,6 +1135,26 @@
+ (set_attr "type" "fmuld")]
+ )
+
++(define_insn "*mulsf3neghf_vfp"
++ [(set (match_operand:HF 0 "s_register_operand" "=t")
++ (mult:HF (neg:HF (match_operand:HF 1 "s_register_operand" "t"))
++ (match_operand:HF 2 "s_register_operand" "t")))]
++ "TARGET_VFP_FP16INST && !flag_rounding_math"
++ "vnmul.f16\\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fmuls")]
++)
++
++(define_insn "*negmulhf3_vfp"
++ [(set (match_operand:HF 0 "s_register_operand" "=t")
++ (neg:HF (mult:HF (match_operand:HF 1 "s_register_operand" "t")
++ (match_operand:HF 2 "s_register_operand" "t"))))]
++ "TARGET_VFP_FP16INST"
++ "vnmul.f16\\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fmuls")]
++)
++
+ (define_insn "*mulsf3negsf_vfp"
+ [(set (match_operand:SF 0 "s_register_operand" "=t")
+ (mult:SF (neg:SF (match_operand:SF 1 "s_register_operand" "t"))
+@@ -813,6 +1204,18 @@
+ ;; Multiply-accumulate insns
+
+ ;; 0 = 1 * 2 + 0
++(define_insn "*mulsf3addhf_vfp"
++ [(set (match_operand:HF 0 "s_register_operand" "=t")
++ (plus:HF
++ (mult:HF (match_operand:HF 2 "s_register_operand" "t")
++ (match_operand:HF 3 "s_register_operand" "t"))
++ (match_operand:HF 1 "s_register_operand" "0")))]
++ "TARGET_VFP_FP16INST"
++ "vmla.f16\\t%0, %2, %3"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fmacs")]
++)
++
+ (define_insn "*mulsf3addsf_vfp"
+ [(set (match_operand:SF 0 "s_register_operand" "=t")
+ (plus:SF (mult:SF (match_operand:SF 2 "s_register_operand" "t")
+@@ -838,6 +1241,17 @@
+ )
+
+ ;; 0 = 1 * 2 - 0
++(define_insn "*mulhf3subhf_vfp"
++ [(set (match_operand:HF 0 "s_register_operand" "=t")
++ (minus:HF (mult:HF (match_operand:HF 2 "s_register_operand" "t")
++ (match_operand:HF 3 "s_register_operand" "t"))
++ (match_operand:HF 1 "s_register_operand" "0")))]
++ "TARGET_VFP_FP16INST"
++ "vnmls.f16\\t%0, %2, %3"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fmacs")]
++)
++
+ (define_insn "*mulsf3subsf_vfp"
+ [(set (match_operand:SF 0 "s_register_operand" "=t")
+ (minus:SF (mult:SF (match_operand:SF 2 "s_register_operand" "t")
+@@ -863,6 +1277,17 @@
+ )
+
+ ;; 0 = -(1 * 2) + 0
++(define_insn "*mulhf3neghfaddhf_vfp"
++ [(set (match_operand:HF 0 "s_register_operand" "=t")
++ (minus:HF (match_operand:HF 1 "s_register_operand" "0")
++ (mult:HF (match_operand:HF 2 "s_register_operand" "t")
++ (match_operand:HF 3 "s_register_operand" "t"))))]
++ "TARGET_VFP_FP16INST"
++ "vmls.f16\\t%0, %2, %3"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fmacs")]
++)
++
+ (define_insn "*mulsf3negsfaddsf_vfp"
+ [(set (match_operand:SF 0 "s_register_operand" "=t")
+ (minus:SF (match_operand:SF 1 "s_register_operand" "0")
+@@ -889,6 +1314,18 @@
+
+
+ ;; 0 = -(1 * 2) - 0
++(define_insn "*mulhf3neghfsubhf_vfp"
++ [(set (match_operand:HF 0 "s_register_operand" "=t")
++ (minus:HF (mult:HF
++ (neg:HF (match_operand:HF 2 "s_register_operand" "t"))
++ (match_operand:HF 3 "s_register_operand" "t"))
++ (match_operand:HF 1 "s_register_operand" "0")))]
++ "TARGET_VFP_FP16INST"
++ "vnmla.f16\\t%0, %2, %3"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fmacs")]
++)
++
+ (define_insn "*mulsf3negsfsubsf_vfp"
+ [(set (match_operand:SF 0 "s_register_operand" "=t")
+ (minus:SF (mult:SF
+@@ -917,6 +1354,30 @@
+
+ ;; Fused-multiply-accumulate
+
++(define_insn "fmahf4"
++ [(set (match_operand:HF 0 "register_operand" "=w")
++ (fma:HF
++ (match_operand:HF 1 "register_operand" "w")
++ (match_operand:HF 2 "register_operand" "w")
++ (match_operand:HF 3 "register_operand" "0")))]
++ "TARGET_VFP_FP16INST"
++ "vfma.f16\\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "ffmas")]
++)
++
++(define_expand "neon_vfmahf"
++ [(match_operand:HF 0 "s_register_operand")
++ (match_operand:HF 1 "s_register_operand")
++ (match_operand:HF 2 "s_register_operand")
++ (match_operand:HF 3 "s_register_operand")]
++ "TARGET_VFP_FP16INST"
++{
++ emit_insn (gen_fmahf4 (operands[0], operands[2], operands[3],
++ operands[1]));
++ DONE;
++})
++
+ (define_insn "fma<SDF:mode>4"
+ [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
+ (fma:SDF (match_operand:SDF 1 "register_operand" "<F_constraint>")
+@@ -929,6 +1390,30 @@
+ (set_attr "type" "ffma<vfp_type>")]
+ )
+
++(define_insn "fmsubhf4_fp16"
++ [(set (match_operand:HF 0 "register_operand" "=w")
++ (fma:HF
++ (neg:HF (match_operand:HF 1 "register_operand" "w"))
++ (match_operand:HF 2 "register_operand" "w")
++ (match_operand:HF 3 "register_operand" "0")))]
++ "TARGET_VFP_FP16INST"
++ "vfms.f16\\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "ffmas")]
++)
++
++(define_expand "neon_vfmshf"
++ [(match_operand:HF 0 "s_register_operand")
++ (match_operand:HF 1 "s_register_operand")
++ (match_operand:HF 2 "s_register_operand")
++ (match_operand:HF 3 "s_register_operand")]
++ "TARGET_VFP_FP16INST"
++{
++ emit_insn (gen_fmsubhf4_fp16 (operands[0], operands[2], operands[3],
++ operands[1]));
++ DONE;
++})
++
+ (define_insn "*fmsub<SDF:mode>4"
+ [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
+ (fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand"
+@@ -942,6 +1427,17 @@
+ (set_attr "type" "ffma<vfp_type>")]
+ )
+
++(define_insn "*fnmsubhf4"
++ [(set (match_operand:HF 0 "register_operand" "=w")
++ (fma:HF (match_operand:HF 1 "register_operand" "w")
++ (match_operand:HF 2 "register_operand" "w")
++ (neg:HF (match_operand:HF 3 "register_operand" "0"))))]
++ "TARGET_VFP_FP16INST"
++ "vfnms.f16\\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "ffmas")]
++)
++
+ (define_insn "*fnmsub<SDF:mode>4"
+ [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
+ (fma:SDF (match_operand:SDF 1 "register_operand" "<F_constraint>")
+@@ -954,6 +1450,17 @@
+ (set_attr "type" "ffma<vfp_type>")]
+ )
+
++(define_insn "*fnmaddhf4"
++ [(set (match_operand:HF 0 "register_operand" "=w")
++ (fma:HF (neg:HF (match_operand:HF 1 "register_operand" "w"))
++ (match_operand:HF 2 "register_operand" "w")
++ (neg:HF (match_operand:HF 3 "register_operand" "0"))))]
++ "TARGET_VFP_FP16INST"
++ "vfnma.f16\\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "ffmas")]
++)
++
+ (define_insn "*fnmadd<SDF:mode>4"
+ [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>")
+ (fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand"
+@@ -993,7 +1500,7 @@
+ (define_insn "extendhfsf2"
+ [(set (match_operand:SF 0 "s_register_operand" "=t")
+ (float_extend:SF (match_operand:HF 1 "s_register_operand" "t")))]
+- "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FP16"
++ "TARGET_32BIT && TARGET_HARD_FLOAT && (TARGET_FP16 || TARGET_VFP_FP16INST)"
+ "vcvtb%?.f32.f16\\t%0, %1"
+ [(set_attr "predicable" "yes")
+ (set_attr "predicable_short_it" "no")
+@@ -1003,7 +1510,7 @@
+ (define_insn "truncsfhf2"
+ [(set (match_operand:HF 0 "s_register_operand" "=t")
+ (float_truncate:HF (match_operand:SF 1 "s_register_operand" "t")))]
+- "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FP16"
++ "TARGET_32BIT && TARGET_HARD_FLOAT && (TARGET_FP16 || TARGET_VFP_FP16INST)"
+ "vcvtb%?.f16.f32\\t%0, %1"
+ [(set_attr "predicable" "yes")
+ (set_attr "predicable_short_it" "no")
+@@ -1096,6 +1603,27 @@
+
+ ;; Sqrt insns.
+
++(define_insn "neon_vsqrthf"
++ [(set (match_operand:HF 0 "s_register_operand" "=w")
++ (sqrt:HF (match_operand:HF 1 "s_register_operand" "w")))]
++ "TARGET_VFP_FP16INST"
++ "vsqrt.f16\t%0, %1"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fsqrts")]
++)
++
++(define_insn "neon_vrsqrtshf"
++ [(set
++ (match_operand:HF 0 "s_register_operand" "=w")
++ (unspec:HF [(match_operand:HF 1 "s_register_operand" "w")
++ (match_operand:HF 2 "s_register_operand" "w")]
++ UNSPEC_VRSQRTS))]
++ "TARGET_VFP_FP16INST"
++ "vrsqrts.f16\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "fsqrts")]
++)
++
+ ; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input
+ ; operands, so mark the output as early clobber for VFPv2 on ARMv5 or
+ ; earlier.
+@@ -1252,9 +1780,6 @@
+ )
+
+ ;; Fixed point to floating point conversions.
+-(define_code_iterator FCVT [unsigned_float float])
+-(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
-
--#define vcvts_n_u32_f32(a, b) \
-- __extension__ \
-- ({ \
-- float32_t a_ = (a); \
-- uint32_t result; \
-- __asm__ ("fcvtzu %s0,%s1,%2" \
-- : "=w"(result) \
-- : "w"(a_), "i"(b) \
-- : /* No clobbers */); \
-- result; \
-- })
+ (define_insn "*combine_vcvt_f32_<FCVTI32typename>"
+ [(set (match_operand:SF 0 "s_register_operand" "=t")
+ (mult:SF (FCVT:SF (match_operand:SI 1 "s_register_operand" "0"))
+@@ -1299,6 +1824,125 @@
+ (set_attr "type" "f_cvtf2i")]
+ )
+
++;; FP16 conversions.
++(define_insn "neon_vcvth<sup>hf"
++ [(set (match_operand:HF 0 "s_register_operand" "=w")
++ (unspec:HF
++ [(match_operand:SI 1 "s_register_operand" "w")]
++ VCVTH_US))]
++ "TARGET_VFP_FP16INST"
++ "vcvt.f16.<sup>%#32\t%0, %1"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "f_cvti2f")]
++)
++
++(define_insn "neon_vcvth<sup>si"
++ [(set (match_operand:SI 0 "s_register_operand" "=w")
++ (unspec:SI
++ [(match_operand:HF 1 "s_register_operand" "w")]
++ VCVTH_US))]
++ "TARGET_VFP_FP16INST"
++ "vcvt.<sup>%#32.f16\t%0, %1"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "f_cvtf2i")]
++)
++
++;; The neon_vcvth<sup>_nhf patterns are used to generate the instruction for the
++;; vcvth_n_f16_<sup>32 arm_fp16 intrinsics. They are complicated by the
++;; hardware requirement that the source and destination registers are the same
++;; despite having different machine modes. The approach is to use a temporary
++;; register for the conversion and move that to the correct destination.
++
++;; Generate an unspec pattern for the intrinsic.
++(define_insn "neon_vcvth<sup>_nhf_unspec"
++ [(set
++ (match_operand:SI 0 "s_register_operand" "=w")
++ (unspec:SI
++ [(match_operand:SI 1 "s_register_operand" "0")
++ (match_operand:SI 2 "immediate_operand" "i")]
++ VCVT_HF_US_N))
++ (set
++ (match_operand:HF 3 "s_register_operand" "=w")
++ (float_truncate:HF (float:SF (match_dup 0))))]
++ "TARGET_VFP_FP16INST"
++{
++ neon_const_bounds (operands[2], 1, 33);
++ return "vcvt.f16.<sup>32\t%0, %0, %2\;vmov.f32\t%3, %0";
++}
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "f_cvti2f")]
++)
++
++;; Generate the instruction patterns needed for vcvth_n_f16_s32 neon intrinsics.
++(define_expand "neon_vcvth<sup>_nhf"
++ [(match_operand:HF 0 "s_register_operand")
++ (unspec:HF [(match_operand:SI 1 "s_register_operand")
++ (match_operand:SI 2 "immediate_operand")]
++ VCVT_HF_US_N)]
++"TARGET_VFP_FP16INST"
++{
++ rtx op1 = gen_reg_rtx (SImode);
++
++ neon_const_bounds (operands[2], 1, 33);
++
++ emit_move_insn (op1, operands[1]);
++ emit_insn (gen_neon_vcvth<sup>_nhf_unspec (op1, op1, operands[2],
++ operands[0]));
++ DONE;
++})
++
++;; The neon_vcvth<sup>_nsi patterns are used to generate the instruction for the
++;; vcvth_n_<sup>32_f16 arm_fp16 intrinsics. They have the same restrictions and
++;; are implemented in the same way as the neon_vcvth<sup>_nhf patterns.
++
++;; Generate an unspec pattern, constraining the registers.
++(define_insn "neon_vcvth<sup>_nsi_unspec"
++ [(set (match_operand:SI 0 "s_register_operand" "=w")
++ (unspec:SI
++ [(fix:SI
++ (fix:SF
++ (float_extend:SF
++ (match_operand:HF 1 "s_register_operand" "w"))))
++ (match_operand:SI 2 "immediate_operand" "i")]
++ VCVT_SI_US_N))]
++ "TARGET_VFP_FP16INST"
++{
++ neon_const_bounds (operands[2], 1, 33);
++ return "vmov.f32\t%0, %1\;vcvt.<sup>%#32.f16\t%0, %0, %2";
++}
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "f_cvtf2i")]
++)
++
++;; Generate the instruction patterns needed for vcvth_n_f16_s32 neon intrinsics.
++(define_expand "neon_vcvth<sup>_nsi"
++ [(match_operand:SI 0 "s_register_operand")
++ (unspec:SI
++ [(match_operand:HF 1 "s_register_operand")
++ (match_operand:SI 2 "immediate_operand")]
++ VCVT_SI_US_N)]
++ "TARGET_VFP_FP16INST"
++{
++ rtx op1 = gen_reg_rtx (SImode);
++
++ neon_const_bounds (operands[2], 1, 33);
++ emit_insn (gen_neon_vcvth<sup>_nsi_unspec (op1, operands[1], operands[2]));
++ emit_move_insn (operands[0], op1);
++ DONE;
++})
++
++(define_insn "neon_vcvt<vcvth_op>h<sup>si"
++ [(set
++ (match_operand:SI 0 "s_register_operand" "=w")
++ (unspec:SI
++ [(match_operand:HF 1 "s_register_operand" "w")]
++ VCVT_HF_US))]
++ "TARGET_VFP_FP16INST"
++ "vcvt<vcvth_op>.<sup>%#32.f16\t%0, %1"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "f_cvtf2i")]
++)
++
+ ;; Store multiple insn used in function prologue.
+ (define_insn "*push_multi_vfp"
+ [(match_parallel 2 "multi_register_push"
+@@ -1368,6 +2012,20 @@
+ )
+
+ ;; Scalar forms for the IEEE-754 fmax()/fmin() functions
++
++(define_insn "neon_<fmaxmin_op>hf"
++ [(set
++ (match_operand:HF 0 "s_register_operand" "=w")
++ (unspec:HF
++ [(match_operand:HF 1 "s_register_operand" "w")
++ (match_operand:HF 2 "s_register_operand" "w")]
++ VMAXMINFNM))]
++ "TARGET_VFP_FP16INST"
++ "<fmaxmin_op>.f16\t%0, %1, %2"
++ [(set_attr "conds" "unconditional")
++ (set_attr "type" "f_minmaxs")]
++)
++
+ (define_insn "<fmaxmin><mode>3"
+ [(set (match_operand:SDF 0 "s_register_operand" "=<F_constraint>")
+ (unspec:SDF [(match_operand:SDF 1 "s_register_operand" "<F_constraint>")
+--- a/src/gcc/config/linux.c
++++ b/src/gcc/config/linux.c
+@@ -26,7 +26,7 @@ along with GCC; see the file COPYING3. If not see
+ bool
+ linux_libc_has_function (enum function_class fn_class)
+ {
+- if (OPTION_GLIBC)
++ if (OPTION_GLIBC || OPTION_MUSL)
+ return true;
+ if (OPTION_BIONIC)
+ if (fn_class == function_c94
+--- a/src/gcc/configure
++++ b/src/gcc/configure
+@@ -1711,7 +1711,8 @@ Optional Packages:
+ --with-stabs arrange to use stabs instead of host debug format
+ --with-dwarf2 force the default debug format to be DWARF 2
+ --with-specs=SPECS add SPECS to driver command-line processing
+- --with-pkgversion=PKG Use PKG in the version string in place of "GCC"
++ --with-pkgversion=PKG Use PKG in the version string in place of "Linaro
++ GCC `cat $srcdir/LINARO-VERSION`"
+ --with-bugurl=URL Direct users to URL to report a bug
+ --with-multilib-list select multilibs (AArch64, SH and x86-64 only)
+ --with-gnu-ld assume the C compiler uses GNU ld default=no
+@@ -7651,7 +7652,7 @@ if test "${with_pkgversion+set}" = set; then :
+ *) PKGVERSION="($withval) " ;;
+ esac
+ else
+- PKGVERSION="(GCC) "
++ PKGVERSION="(Linaro GCC `cat $srcdir/LINARO-VERSION`) "
+
+ fi
+
+@@ -18453,7 +18454,7 @@ else
+ lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
+ lt_status=$lt_dlunknown
+ cat > conftest.$ac_ext <<_LT_EOF
+-#line 18456 "configure"
++#line 18457 "configure"
+ #include "confdefs.h"
+
+ #if HAVE_DLFCN_H
+@@ -18559,7 +18560,7 @@ else
+ lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
+ lt_status=$lt_dlunknown
+ cat > conftest.$ac_ext <<_LT_EOF
+-#line 18562 "configure"
++#line 18563 "configure"
+ #include "confdefs.h"
+
+ #if HAVE_DLFCN_H
+--- a/src/gcc/configure.ac
++++ b/src/gcc/configure.ac
+@@ -903,7 +903,7 @@ AC_ARG_WITH(specs,
+ )
+ AC_SUBST(CONFIGURE_SPECS)
+
+-ACX_PKGVERSION([GCC])
++ACX_PKGVERSION([Linaro GCC `cat $srcdir/LINARO-VERSION`])
+ ACX_BUGURL([http://gcc.gnu.org/bugs.html])
+
+ # Sanity check enable_languages in case someone does not run the toplevel
+--- a/src/gcc/cppbuiltin.c
++++ b/src/gcc/cppbuiltin.c
+@@ -52,18 +52,41 @@ parse_basever (int *major, int *minor, int *patchlevel)
+ *patchlevel = s_patchlevel;
+ }
+
++/* Parse a LINAROVER version string of the format "M.m-year.month[-spin][~dev]"
++ to create Linaro release number YYYYMM and spin version. */
++static void
++parse_linarover (int *release, int *spin)
++{
++ static int s_year = -1, s_month, s_spin;
++
++ if (s_year == -1)
++ if (sscanf (LINAROVER, "%*[^-]-%d.%d-%d", &s_year, &s_month, &s_spin) != 3)
++ {
++ sscanf (LINAROVER, "%*[^-]-%d.%d", &s_year, &s_month);
++ s_spin = 0;
++ }
++
++ if (release)
++ *release = s_year * 100 + s_month;
++
++ if (spin)
++ *spin = s_spin;
++}
+
+ /* Define __GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__ and __VERSION__. */
+ static void
+ define__GNUC__ (cpp_reader *pfile)
+ {
+- int major, minor, patchlevel;
++ int major, minor, patchlevel, linaro_release, linaro_spin;
+
+ parse_basever (&major, &minor, &patchlevel);
++ parse_linarover (&linaro_release, &linaro_spin);
+ cpp_define_formatted (pfile, "__GNUC__=%d", major);
+ cpp_define_formatted (pfile, "__GNUC_MINOR__=%d", minor);
+ cpp_define_formatted (pfile, "__GNUC_PATCHLEVEL__=%d", patchlevel);
+ cpp_define_formatted (pfile, "__VERSION__=\"%s\"", version_string);
++ cpp_define_formatted (pfile, "__LINARO_RELEASE__=%d", linaro_release);
++ cpp_define_formatted (pfile, "__LINARO_SPIN__=%d", linaro_spin);
+ cpp_define_formatted (pfile, "__ATOMIC_RELAXED=%d", MEMMODEL_RELAXED);
+ cpp_define_formatted (pfile, "__ATOMIC_SEQ_CST=%d", MEMMODEL_SEQ_CST);
+ cpp_define_formatted (pfile, "__ATOMIC_ACQUIRE=%d", MEMMODEL_ACQUIRE);
+--- a/src/gcc/expmed.c
++++ b/src/gcc/expmed.c
+@@ -2522,16 +2522,8 @@ expand_variable_shift (enum tree_code code, machine_mode mode, rtx shifted,
+ }
+
+
+-/* Indicates the type of fixup needed after a constant multiplication.
+- BASIC_VARIANT means no fixup is needed, NEGATE_VARIANT means that
+- the result should be negated, and ADD_VARIANT means that the
+- multiplicand should be added to the result. */
+-enum mult_variant {basic_variant, negate_variant, add_variant};
-
- __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
- vcvtx_f32_f64 (float64x2_t a)
+ static void synth_mult (struct algorithm *, unsigned HOST_WIDE_INT,
+ const struct mult_cost *, machine_mode mode);
+-static bool choose_mult_variant (machine_mode, HOST_WIDE_INT,
+- struct algorithm *, enum mult_variant *, int);
+ static rtx expand_mult_const (machine_mode, rtx, HOST_WIDE_INT, rtx,
+ const struct algorithm *, enum mult_variant);
+ static unsigned HOST_WIDE_INT invert_mod2n (unsigned HOST_WIDE_INT, int);
+@@ -3021,7 +3013,7 @@ synth_mult (struct algorithm *alg_out, unsigned HOST_WIDE_INT t,
+ Return true if the cheapest of these cost less than MULT_COST,
+ describing the algorithm in *ALG and final fixup in *VARIANT. */
+
+-static bool
++bool
+ choose_mult_variant (machine_mode mode, HOST_WIDE_INT val,
+ struct algorithm *alg, enum mult_variant *variant,
+ int mult_cost)
+--- a/src/gcc/expmed.h
++++ b/src/gcc/expmed.h
+@@ -35,6 +35,15 @@ enum alg_code {
+ alg_impossible
+ };
+
++/* Indicates the type of fixup needed after a constant multiplication.
++ BASIC_VARIANT means no fixup is needed, NEGATE_VARIANT means that
++ the result should be negated, and ADD_VARIANT means that the
++ multiplicand should be added to the result. */
++enum mult_variant {basic_variant, negate_variant, add_variant};
++
++bool choose_mult_variant (machine_mode, HOST_WIDE_INT,
++ struct algorithm *, enum mult_variant *, int);
++
+ /* This structure holds the "cost" of a multiply sequence. The
+ "cost" field holds the total rtx_cost of every operator in the
+ synthetic multiplication sequence, hence cost(a op b) is defined
+--- a/src/gcc/fold-const.c
++++ b/src/gcc/fold-const.c
+@@ -7216,7 +7216,16 @@ native_encode_real (const_tree expr, unsigned char *ptr, int len, int off)
+ offset += byte % UNITS_PER_WORD;
+ }
+ else
+- offset = BYTES_BIG_ENDIAN ? 3 - byte : byte;
++ {
++ offset = byte;
++ if (BYTES_BIG_ENDIAN)
++ {
++ /* Reverse bytes within each long, or within the entire float
++ if it's smaller than a long (for HFmode). */
++ offset = MIN (3, total_bytes - 1) - offset;
++ gcc_assert (offset >= 0);
++ }
++ }
+ offset = offset + ((bitpos / BITS_PER_UNIT) & ~3);
+ if (offset >= off
+ && offset - off < len)
+--- a/src/gcc/genmultilib
++++ b/src/gcc/genmultilib
+@@ -186,7 +186,8 @@ fi
+ EOF
+ chmod +x tmpmultilib
+
+-combinations=`initial=/ ./tmpmultilib ${options}`
++combination_space=`initial=/ ./tmpmultilib ${options}`
++combinations="$combination_space"
+
+ # If there exceptions, weed them out now
+ if [ -n "${exceptions}" ]; then
+@@ -472,14 +473,19 @@ for rrule in ${multilib_reuse}; do
+ # in this variable, it means no multilib will be built for current reuse
+ # rule. Thus the reuse purpose specified by current rule is meaningless.
+ if expr "${combinations} " : ".*/${combo}/.*" > /dev/null; then
+- combo="/${combo}/"
+- dirout=`./tmpmultilib3 "${combo}" "${todirnames}" "${toosdirnames}" "${enable_multilib}"`
+- copts="/${copts}/"
+- optout=`./tmpmultilib4 "${copts}" "${options}"`
+- # Output the line with all appropriate matches.
+- dirout="${dirout}" optout="${optout}" ./tmpmultilib2
++ if expr "${combination_space} " : ".*/${copts}/.*" > /dev/null; then
++ combo="/${combo}/"
++ dirout=`./tmpmultilib3 "${combo}" "${todirnames}" "${toosdirnames}" "${enable_multilib}"`
++ copts="/${copts}/"
++ optout=`./tmpmultilib4 "${copts}" "${options}"`
++ # Output the line with all appropriate matches.
++ dirout="${dirout}" optout="${optout}" ./tmpmultilib2
++ else
++ echo "The rule ${rrule} contains an option absent from MULTILIB_OPTIONS." >&2
++ exit 1
++ fi
+ else
+- echo "The rule ${rrule} is trying to reuse nonexistent multilib."
++ echo "The rule ${rrule} is trying to reuse nonexistent multilib." >&2
+ exit 1
+ fi
+ done
+--- a/src/gcc/ifcvt.c
++++ b/src/gcc/ifcvt.c
+@@ -813,10 +813,15 @@ struct noce_if_info
+
+ /* Estimated cost of the particular branch instruction. */
+ unsigned int branch_cost;
++
++ /* The name of the noce transform that succeeded in if-converting
++ this structure. Used for debugging. */
++ const char *transform_name;
+ };
+
+ static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
+ static int noce_try_move (struct noce_if_info *);
++static int noce_try_ifelse_collapse (struct noce_if_info *);
+ static int noce_try_store_flag (struct noce_if_info *);
+ static int noce_try_addcc (struct noce_if_info *);
+ static int noce_try_store_flag_constants (struct noce_if_info *);
+@@ -1115,11 +1120,45 @@ noce_try_move (struct noce_if_info *if_info)
+ emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
+ }
++ if_info->transform_name = "noce_try_move";
+ return TRUE;
+ }
+ return FALSE;
+ }
+
++/* Try forming an IF_THEN_ELSE (cond, b, a) and collapsing that
++ through simplify_rtx. Sometimes that can eliminate the IF_THEN_ELSE.
++ If that is the case, emit the result into x. */
++
++static int
++noce_try_ifelse_collapse (struct noce_if_info * if_info)
++{
++ if (!noce_simple_bbs (if_info))
++ return FALSE;
++
++ machine_mode mode = GET_MODE (if_info->x);
++ rtx if_then_else = simplify_gen_ternary (IF_THEN_ELSE, mode, mode,
++ if_info->cond, if_info->b,
++ if_info->a);
++
++ if (GET_CODE (if_then_else) == IF_THEN_ELSE)
++ return FALSE;
++
++ rtx_insn *seq;
++ start_sequence ();
++ noce_emit_move_insn (if_info->x, if_then_else);
++ seq = end_ifcvt_sequence (if_info);
++ if (!seq)
++ return FALSE;
++
++ emit_insn_before_setloc (seq, if_info->jump,
++ INSN_LOCATION (if_info->insn_a));
++
++ if_info->transform_name = "noce_try_ifelse_collapse";
++ return TRUE;
++}
++
++
+ /* Convert "if (test) x = 1; else x = 0".
+
+ Only try 0 and STORE_FLAG_VALUE here. Other combinations will be
+@@ -1163,6 +1202,7 @@ noce_try_store_flag (struct noce_if_info *if_info)
+
+ emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
++ if_info->transform_name = "noce_try_store_flag";
+ return TRUE;
+ }
+ else
+@@ -1241,6 +1281,7 @@ noce_try_inverse_constants (struct noce_if_info *if_info)
+
+ emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
++ if_info->transform_name = "noce_try_inverse_constants";
+ return true;
+ }
+
+@@ -1461,6 +1502,8 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
+
+ emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
++ if_info->transform_name = "noce_try_store_flag_constants";
++
+ return TRUE;
+ }
+
+@@ -1513,6 +1556,8 @@ noce_try_addcc (struct noce_if_info *if_info)
+
+ emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
++ if_info->transform_name = "noce_try_addcc";
++
+ return TRUE;
+ }
+ end_sequence ();
+@@ -1553,6 +1598,7 @@ noce_try_addcc (struct noce_if_info *if_info)
+
+ emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
++ if_info->transform_name = "noce_try_addcc";
+ return TRUE;
+ }
+ end_sequence ();
+@@ -1617,6 +1663,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
+
+ emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
++ if_info->transform_name = "noce_try_store_flag_mask";
++
+ return TRUE;
+ }
+
+@@ -1767,6 +1815,8 @@ noce_try_cmove (struct noce_if_info *if_info)
+
+ emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
++ if_info->transform_name = "noce_try_cmove";
++
+ return TRUE;
+ }
+ /* If both a and b are constants try a last-ditch transformation:
+@@ -1820,6 +1870,7 @@ noce_try_cmove (struct noce_if_info *if_info)
+
+ emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
++ if_info->transform_name = "noce_try_cmove";
+ return TRUE;
+ }
+ else
+@@ -2273,6 +2324,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
+
+ emit_insn_before_setloc (ifcvt_seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
++ if_info->transform_name = "noce_try_cmove_arith";
+ return TRUE;
+
+ end_seq_and_fail:
+@@ -2364,28 +2416,32 @@ noce_get_alt_condition (struct noce_if_info *if_info, rtx target,
+ switch (code)
+ {
+ case LT:
+- if (actual_val == desired_val + 1)
++ if (desired_val != HOST_WIDE_INT_MAX
++ && actual_val == desired_val + 1)
+ {
+ code = LE;
+ op_b = GEN_INT (desired_val);
+ }
+ break;
+ case LE:
+- if (actual_val == desired_val - 1)
++ if (desired_val != HOST_WIDE_INT_MIN
++ && actual_val == desired_val - 1)
+ {
+ code = LT;
+ op_b = GEN_INT (desired_val);
+ }
+ break;
+ case GT:
+- if (actual_val == desired_val - 1)
++ if (desired_val != HOST_WIDE_INT_MIN
++ && actual_val == desired_val - 1)
+ {
+ code = GE;
+ op_b = GEN_INT (desired_val);
+ }
+ break;
+ case GE:
+- if (actual_val == desired_val + 1)
++ if (desired_val != HOST_WIDE_INT_MAX
++ && actual_val == desired_val + 1)
+ {
+ code = GT;
+ op_b = GEN_INT (desired_val);
+@@ -2525,6 +2581,7 @@ noce_try_minmax (struct noce_if_info *if_info)
+ emit_insn_before_setloc (seq, if_info->jump, INSN_LOCATION (if_info->insn_a));
+ if_info->cond = cond;
+ if_info->cond_earliest = earliest;
++ if_info->transform_name = "noce_try_minmax";
+
+ return TRUE;
+ }
+@@ -2691,6 +2748,7 @@ noce_try_abs (struct noce_if_info *if_info)
+ emit_insn_before_setloc (seq, if_info->jump, INSN_LOCATION (if_info->insn_a));
+ if_info->cond = cond;
+ if_info->cond_earliest = earliest;
++ if_info->transform_name = "noce_try_abs";
+
+ return TRUE;
+ }
+@@ -2772,6 +2830,8 @@ noce_try_sign_mask (struct noce_if_info *if_info)
+ return FALSE;
+
+ emit_insn_before_setloc (seq, if_info->jump, INSN_LOCATION (if_info->insn_a));
++ if_info->transform_name = "noce_try_sign_mask";
++
+ return TRUE;
+ }
+
+@@ -2877,6 +2937,7 @@ noce_try_bitop (struct noce_if_info *if_info)
+ emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
+ }
++ if_info->transform_name = "noce_try_bitop";
+ return TRUE;
+ }
+
+@@ -3167,6 +3228,41 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
+ if (if_info->then_else_reversed)
+ std::swap (old_val, new_val);
+
++
++ /* We allow simple lowpart register subreg SET sources in
++ bb_ok_for_noce_convert_multiple_sets. Be careful when processing
++ sequences like:
++ (set (reg:SI r1) (reg:SI r2))
++ (set (reg:HI r3) (subreg:HI (r1)))
++ For the second insn new_val or old_val (r1 in this example) will be
++ taken from the temporaries and have the wider mode which will not
++ match with the mode of the other source of the conditional move, so
++ we'll end up trying to emit r4:HI = cond ? (r1:SI) : (r3:HI).
++ Wrap the two cmove operands into subregs if appropriate to prevent
++ that. */
++ if (GET_MODE (new_val) != GET_MODE (temp))
++ {
++ machine_mode src_mode = GET_MODE (new_val);
++ machine_mode dst_mode = GET_MODE (temp);
++ if (GET_MODE_SIZE (src_mode) <= GET_MODE_SIZE (dst_mode))
++ {
++ end_sequence ();
++ return FALSE;
++ }
++ new_val = lowpart_subreg (dst_mode, new_val, src_mode);
++ }
++ if (GET_MODE (old_val) != GET_MODE (temp))
++ {
++ machine_mode src_mode = GET_MODE (old_val);
++ machine_mode dst_mode = GET_MODE (temp);
++ if (GET_MODE_SIZE (src_mode) <= GET_MODE_SIZE (dst_mode))
++ {
++ end_sequence ();
++ return FALSE;
++ }
++ old_val = lowpart_subreg (dst_mode, old_val, src_mode);
++ }
++
+ /* Actually emit the conditional move. */
+ rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+ x, y, new_val, old_val);
+@@ -3240,6 +3336,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
+ }
+
+ num_updated_if_blocks++;
++ if_info->transform_name = "noce_convert_multiple_sets";
+ return TRUE;
+ }
+
+@@ -3277,9 +3374,15 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
+ rtx src = SET_SRC (set);
+
+ /* We can possibly relax this, but for now only handle REG to REG
+- moves. This avoids any issues that might come from introducing
+- loads/stores that might violate data-race-freedom guarantees. */
+- if (!(REG_P (src) && REG_P (dest)))
++ (including subreg) moves. This avoids any issues that might come
++ from introducing loads/stores that might violate data-race-freedom
++ guarantees. */
++ if (!REG_P (dest))
++ return false;
++
++ if (!(REG_P (src)
++ || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
++ && subreg_lowpart_p (src))))
+ return false;
+
+ /* Destination must be appropriate for a conditional write. */
+@@ -3336,7 +3439,12 @@ noce_process_if_block (struct noce_if_info *if_info)
+ && bb_ok_for_noce_convert_multiple_sets (then_bb, if_info))
+ {
+ if (noce_convert_multiple_sets (if_info))
+- return TRUE;
++ {
++ if (dump_file && if_info->transform_name)
++ fprintf (dump_file, "if-conversion succeeded through %s\n",
++ if_info->transform_name);
++ return TRUE;
++ }
+ }
+
+ if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
+@@ -3493,6 +3601,8 @@ noce_process_if_block (struct noce_if_info *if_info)
+
+ if (noce_try_move (if_info))
+ goto success;
++ if (noce_try_ifelse_collapse (if_info))
++ goto success;
+ if (noce_try_store_flag (if_info))
+ goto success;
+ if (noce_try_bitop (if_info))
+@@ -3533,6 +3643,9 @@ noce_process_if_block (struct noce_if_info *if_info)
+ return FALSE;
+
+ success:
++ if (dump_file && if_info->transform_name)
++ fprintf (dump_file, "if-conversion succeeded through %s\n",
++ if_info->transform_name);
+
+ /* If we used a temporary, fix it up now. */
+ if (orig_x != x)
+--- a/src/gcc/internal-fn.c
++++ b/src/gcc/internal-fn.c
+@@ -1810,11 +1810,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
+ /* For sub-word operations, retry with a wider type first. */
+ if (orig_precres == precres && precop <= BITS_PER_WORD)
+ {
+-#if WORD_REGISTER_OPERATIONS
+- int p = BITS_PER_WORD;
+-#else
+- int p = precop;
+-#endif
++ int p = WORD_REGISTER_OPERATIONS ? BITS_PER_WORD : precop;
+ enum machine_mode m = smallest_mode_for_size (p, MODE_INT);
+ tree optype = build_nonstandard_integer_type (GET_MODE_PRECISION (m),
+ uns0_p && uns1_p
+--- a/src/gcc/lra-constraints.c
++++ b/src/gcc/lra-constraints.c
+@@ -1326,7 +1326,22 @@ process_addr_reg (rtx *loc, bool check_only_p, rtx_insn **before, rtx_insn **aft
+
+ subreg_p = GET_CODE (*loc) == SUBREG;
+ if (subreg_p)
+- loc = &SUBREG_REG (*loc);
++ {
++ reg = SUBREG_REG (*loc);
++ mode = GET_MODE (reg);
++
++ /* For mode with size bigger than ptr_mode, there unlikely to be "mov"
++ between two registers with different classes, but there normally will
++ be "mov" which transfers element of vector register into the general
++ register, and this normally will be a subreg which should be reloaded
++ as a whole. This is particularly likely to be triggered when
++ -fno-split-wide-types specified. */
++ if (!REG_P (reg)
++ || in_class_p (reg, cl, &new_class)
++ || GET_MODE_SIZE (mode) <= GET_MODE_SIZE (ptr_mode))
++ loc = &SUBREG_REG (*loc);
++ }
++
+ reg = *loc;
+ mode = GET_MODE (reg);
+ if (! REG_P (reg))
+@@ -2475,14 +2490,29 @@ process_alt_operands (int only_alternative)
+ /* We are trying to spill pseudo into memory. It is
+ usually more costly than moving to a hard register
+ although it might takes the same number of
+- reloads. */
+- if (no_regs_p && REG_P (op) && hard_regno[nop] >= 0)
++ reloads.
++
++ Non-pseudo spill may happen also. Suppose a target allows both
++ register and memory in the operand constraint alternatives,
++ then it's typical that an eliminable register has a substition
++ of "base + offset" which can either be reloaded by a simple
++ "new_reg <= base + offset" which will match the register
++ constraint, or a similar reg addition followed by further spill
++ to and reload from memory which will match the memory
++ constraint, but this memory spill will be much more costly
++ usually.
++
++ Code below increases the reject for both pseudo and non-pseudo
++ spill. */
++ if (no_regs_p
++ && !(MEM_P (op) && offmemok)
++ && !(REG_P (op) && hard_regno[nop] < 0))
+ {
+ if (lra_dump_file != NULL)
+ fprintf
+ (lra_dump_file,
+- " %d Spill pseudo into memory: reject+=3\n",
+- nop);
++ " %d Spill %spseudo into memory: reject+=3\n",
++ nop, REG_P (op) ? "" : "Non-");
+ reject += 3;
+ if (VECTOR_MODE_P (mode))
+ {
+--- a/src/gcc/lto/lto-partition.c
++++ b/src/gcc/lto/lto-partition.c
+@@ -447,7 +447,7 @@ add_sorted_nodes (vec<symtab_node *> &next_nodes, ltrans_partition partition)
+ and in-partition calls was reached. */
+
+ void
+-lto_balanced_map (int n_lto_partitions)
++lto_balanced_map (int n_lto_partitions, int max_partition_size)
{
-@@ -7938,61 +7643,6 @@ vmovn_u64 (uint64x2_t a)
- return result;
+ int n_nodes = 0;
+ int n_varpool_nodes = 0, varpool_pos = 0, best_varpool_pos = 0;
+@@ -511,6 +511,9 @@ lto_balanced_map (int n_lto_partitions)
+ varpool_order.qsort (varpool_node_cmp);
+
+ /* Compute partition size and create the first partition. */
++ if (PARAM_VALUE (MIN_PARTITION_SIZE) > max_partition_size)
++ fatal_error (input_location, "min partition size cannot be greater than max partition size");
++
+ partition_size = total_size / n_lto_partitions;
+ if (partition_size < PARAM_VALUE (MIN_PARTITION_SIZE))
+ partition_size = PARAM_VALUE (MIN_PARTITION_SIZE);
+@@ -719,7 +722,8 @@ lto_balanced_map (int n_lto_partitions)
+ best_cost, best_internal, best_i);
+ /* Partition is too large, unwind into step when best cost was reached and
+ start new partition. */
+- if (partition->insns > 2 * partition_size)
++ if (partition->insns > 2 * partition_size
++ || partition->insns > max_partition_size)
+ {
+ if (best_i != i)
+ {
+--- a/src/gcc/lto/lto-partition.h
++++ b/src/gcc/lto/lto-partition.h
+@@ -35,7 +35,7 @@ extern vec<ltrans_partition> ltrans_partitions;
+
+ void lto_1_to_1_map (void);
+ void lto_max_map (void);
+-void lto_balanced_map (int);
++void lto_balanced_map (int, int);
+ void lto_promote_cross_file_statics (void);
+ void free_ltrans_partitions (void);
+ void lto_promote_statics_nonwpa (void);
+--- a/src/gcc/lto/lto.c
++++ b/src/gcc/lto/lto.c
+@@ -3117,9 +3117,10 @@ do_whole_program_analysis (void)
+ else if (flag_lto_partition == LTO_PARTITION_MAX)
+ lto_max_map ();
+ else if (flag_lto_partition == LTO_PARTITION_ONE)
+- lto_balanced_map (1);
++ lto_balanced_map (1, INT_MAX);
+ else if (flag_lto_partition == LTO_PARTITION_BALANCED)
+- lto_balanced_map (PARAM_VALUE (PARAM_LTO_PARTITIONS));
++ lto_balanced_map (PARAM_VALUE (PARAM_LTO_PARTITIONS),
++ PARAM_VALUE (MAX_PARTITION_SIZE));
+ else
+ gcc_unreachable ();
+
+--- a/src/gcc/params.def
++++ b/src/gcc/params.def
+@@ -1027,7 +1027,12 @@ DEFPARAM (PARAM_LTO_PARTITIONS,
+ DEFPARAM (MIN_PARTITION_SIZE,
+ "lto-min-partition",
+ "Minimal size of a partition for LTO (in estimated instructions).",
+- 1000, 0, 0)
++ 10000, 0, 0)
++
++DEFPARAM (MAX_PARTITION_SIZE,
++ "lto-max-partition",
++ "Maximal size of a partition for LTO (in estimated instructions).",
++ 1000000, 0, INT_MAX)
+
+ /* Diagnostic parameters. */
+
+--- a/src/gcc/rtlanal.c
++++ b/src/gcc/rtlanal.c
+@@ -3657,6 +3657,16 @@ subreg_get_info (unsigned int xregno, machine_mode xmode,
+ info->offset = offset / regsize_xmode;
+ return;
+ }
++ /* It's not valid to extract a subreg of mode YMODE at OFFSET that
++ would go outside of XMODE. */
++ if (!rknown
++ && GET_MODE_SIZE (ymode) + offset > GET_MODE_SIZE (xmode))
++ {
++ info->representable_p = false;
++ info->nregs = nregs_ymode;
++ info->offset = offset / regsize_xmode;
++ return;
++ }
+ /* Quick exit for the simple and common case of extracting whole
+ subregisters from a multiregister value. */
+ /* ??? It would be better to integrate this into the code below,
+@@ -4584,13 +4594,14 @@ nonzero_bits1 (const_rtx x, machine_mode mode, const_rtx known_x,
+ nonzero &= cached_nonzero_bits (SUBREG_REG (x), mode,
+ known_x, known_mode, known_ret);
+
+-#if WORD_REGISTER_OPERATIONS && defined (LOAD_EXTEND_OP)
++#ifdef LOAD_EXTEND_OP
+ /* If this is a typical RISC machine, we only have to worry
+ about the way loads are extended. */
+- if ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
+- ? val_signbit_known_set_p (inner_mode, nonzero)
+- : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
+- || !MEM_P (SUBREG_REG (x)))
++ if (WORD_REGISTER_OPERATIONS
++ && ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
++ ? val_signbit_known_set_p (inner_mode, nonzero)
++ : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
++ || !MEM_P (SUBREG_REG (x))))
+ #endif
+ {
+ /* On many CISC machines, accessing an object in a wider mode
+--- a/src/gcc/simplify-rtx.c
++++ b/src/gcc/simplify-rtx.c
+@@ -5266,6 +5266,50 @@ simplify_const_relational_operation (enum rtx_code code,
+
+ return 0;
+ }
++
++/* Recognize expressions of the form (X CMP 0) ? VAL : OP (X)
++ where OP is CLZ or CTZ and VAL is the value from CLZ_DEFINED_VALUE_AT_ZERO
++ or CTZ_DEFINED_VALUE_AT_ZERO respectively and return OP (X) if the expression
++ can be simplified to that or NULL_RTX if not.
++ Assume X is compared against zero with CMP_CODE and the true
++ arm is TRUE_VAL and the false arm is FALSE_VAL. */
++
++static rtx
++simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
++{
++ if (cmp_code != EQ && cmp_code != NE)
++ return NULL_RTX;
++
++ /* Result on X == 0 and X !=0 respectively. */
++ rtx on_zero, on_nonzero;
++ if (cmp_code == EQ)
++ {
++ on_zero = true_val;
++ on_nonzero = false_val;
++ }
++ else
++ {
++ on_zero = false_val;
++ on_nonzero = true_val;
++ }
++
++ rtx_code op_code = GET_CODE (on_nonzero);
++ if ((op_code != CLZ && op_code != CTZ)
++ || !rtx_equal_p (XEXP (on_nonzero, 0), x)
++ || !CONST_INT_P (on_zero))
++ return NULL_RTX;
++
++ HOST_WIDE_INT op_val;
++ if (((op_code == CLZ
++ && CLZ_DEFINED_VALUE_AT_ZERO (GET_MODE (on_nonzero), op_val))
++ || (op_code == CTZ
++ && CTZ_DEFINED_VALUE_AT_ZERO (GET_MODE (on_nonzero), op_val)))
++ && op_val == INTVAL (on_zero))
++ return on_nonzero;
++
++ return NULL_RTX;
++}
++
+
+ /* Simplify CODE, an operation with result mode MODE and three operands,
+ OP0, OP1, and OP2. OP0_MODE was the mode of OP0 before it became
+@@ -5399,6 +5443,19 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
+ }
+ }
+
++ /* Convert x == 0 ? N : clz (x) into clz (x) when
++ CLZ_DEFINED_VALUE_AT_ZERO is defined to N for the mode of x.
++ Similarly for ctz (x). */
++ if (COMPARISON_P (op0) && !side_effects_p (op0)
++ && XEXP (op0, 1) == const0_rtx)
++ {
++ rtx simplified
++ = simplify_cond_clz_ctz (XEXP (op0, 0), GET_CODE (op0),
++ op1, op2);
++ if (simplified)
++ return simplified;
++ }
++
+ if (COMPARISON_P (op0) && ! side_effects_p (op0))
+ {
+ machine_mode cmp_mode = (GET_MODE (XEXP (op0, 0)) == VOIDmode
+--- a/src/gcc/testsuite/c-c++-common/asan/clone-test-1.c
++++ b/src/gcc/testsuite/c-c++-common/asan/clone-test-1.c
+@@ -29,6 +29,10 @@ int main(int argc, char **argv) {
+ char *sp = child_stack + kStackSize; /* Stack grows down. */
+ printf("Parent: %p\n", sp);
+ pid_t clone_pid = clone(Child, sp, CLONE_FILES | CLONE_VM, NULL, 0, 0, 0);
++ if (clone_pid == -1) {
++ perror("clone");
++ return 1;
++ }
+ int status;
+ pid_t wait_result = waitpid(clone_pid, &status, __WCLONE);
+ if (wait_result < 0) {
+--- a/src/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
++++ b/src/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C
+@@ -1,5 +1,6 @@
+ /* Test various operators on __fp16 and mixed __fp16/float operands. */
+ /* { dg-do run { target arm*-*-* } } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
+
+ #include "arm-fp16-ops.h"
+--- a/src/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
++++ b/src/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C
+@@ -1,5 +1,6 @@
+ /* Test various operators on __fp16 and mixed __fp16/float operands. */
+ /* { dg-do run { target arm*-*-* } } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative -ffast-math" } */
+
+ #include "arm-fp16-ops.h"
+--- a/src/gcc/testsuite/g++.dg/ext/arm-fp16/fp16-param-1.C
++++ b/src/gcc/testsuite/g++.dg/ext/arm-fp16/fp16-param-1.C
+@@ -1,10 +1,14 @@
+ /* { dg-do compile { target arm*-*-* } } */
+ /* { dg-options "-mfp16-format=ieee" } */
+
+-/* Functions cannot have parameters of type __fp16. */
+-extern void f (__fp16); /* { dg-error "parameters cannot have __fp16 type" } */
+-extern void (*pf) (__fp16); /* { dg-error "parameters cannot have __fp16 type" } */
++/* Test that the ACLE macro is defined. */
++#if __ARM_FP16_ARGS != 1
++#error Unexpected value for __ARM_FP16_ARGS
++#endif
++
++/* Test that __fp16 is supported as a parameter type. */
++extern void f (__fp16);
++extern void (*pf) (__fp16);
+
+-/* These should be OK. */
+ extern void g (__fp16 *);
+ extern void (*pg) (__fp16 *);
+--- a/src/gcc/testsuite/g++.dg/ext/arm-fp16/fp16-return-1.C
++++ b/src/gcc/testsuite/g++.dg/ext/arm-fp16/fp16-return-1.C
+@@ -1,10 +1,9 @@
+ /* { dg-do compile { target arm*-*-* } } */
+ /* { dg-options "-mfp16-format=ieee" } */
+
+-/* Functions cannot return type __fp16. */
+-extern __fp16 f (void); /* { dg-error "cannot return __fp16" } */
+-extern __fp16 (*pf) (void); /* { dg-error "cannot return __fp16" } */
++/* Test that __fp16 is supported as a return type. */
++extern __fp16 f (void);
++extern __fp16 (*pf) (void);
+
+-/* These should be OK. */
+ extern __fp16 *g (void);
+ extern __fp16 *(*pg) (void);
+--- a/src/gcc/testsuite/g++.dg/inherit/thunk1.C
++++ b/src/gcc/testsuite/g++.dg/inherit/thunk1.C
+@@ -1,4 +1,5 @@
+-// { dg-do run { target i?86-*-* x86_64-*-* s390*-*-* alpha*-*-* ia64-*-* sparc*-*-* } }
++// { dg-do run { target arm*-*-* aarch64*-*-* i?86-*-* x86_64-*-* s390*-*-* alpha*-*-* ia64-*-* sparc*-*-* } }
++// { dg-skip-if "" { arm_thumb1_ok } }
+
+ #include <stdarg.h>
+
+--- a/src/gcc/testsuite/g++.dg/lto/pr69589_0.C
++++ b/src/gcc/testsuite/g++.dg/lto/pr69589_0.C
+@@ -1,6 +1,8 @@
+ // { dg-lto-do link }
+-// { dg-lto-options "-O2 -rdynamic" }
++// { dg-lto-options "-O2 -rdynamic" }
+ // { dg-extra-ld-options "-r -nostdlib" }
++// { dg-skip-if "Skip targets without -rdynamic support" { arm*-none-eabi aarch64*-*-elf } { "*" } { "" } }
++
+ #pragma GCC visibility push(hidden)
+ struct A { int &operator[] (long); };
+ template <typename> struct B;
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.c-torture/compile/pr71295.c
+@@ -0,0 +1,12 @@
++extern void fn2 (long long);
++int a;
++
++void
++fn1 ()
++{
++ long long b[3];
++ a = 0;
++ for (; a < 3; a++)
++ b[a] = 1;
++ fn2 (b[1]);
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.c-torture/execute/pr37780.c
+@@ -0,0 +1,49 @@
++/* PR middle-end/37780. */
++
++#define VAL (8 * sizeof (int))
++
++int __attribute__ ((noinline, noclone))
++fooctz (int i)
++{
++ return (i == 0) ? VAL : __builtin_ctz (i);
++}
++
++int __attribute__ ((noinline, noclone))
++fooctz2 (int i)
++{
++ return (i != 0) ? __builtin_ctz (i) : VAL;
++}
++
++unsigned int __attribute__ ((noinline, noclone))
++fooctz3 (unsigned int i)
++{
++ return (i > 0) ? __builtin_ctz (i) : VAL;
++}
++
++int __attribute__ ((noinline, noclone))
++fooclz (int i)
++{
++ return (i == 0) ? VAL : __builtin_clz (i);
++}
++
++int __attribute__ ((noinline, noclone))
++fooclz2 (int i)
++{
++ return (i != 0) ? __builtin_clz (i) : VAL;
++}
++
++unsigned int __attribute__ ((noinline, noclone))
++fooclz3 (unsigned int i)
++{
++ return (i > 0) ? __builtin_clz (i) : VAL;
++}
++
++int
++main (void)
++{
++ if (fooctz (0) != VAL || fooctz2 (0) != VAL || fooctz3 (0) != VAL
++ || fooclz (0) != VAL || fooclz2 (0) != VAL || fooclz3 (0) != VAL)
++ __builtin_abort ();
++
++ return 0;
++}
+\ No newline at end of file
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.c-torture/execute/pr66940.c
+@@ -0,0 +1,20 @@
++long long __attribute__ ((noinline, noclone))
++foo (long long ival)
++{
++ if (ival <= 0)
++ return -0x7fffffffffffffffL - 1;
++
++ return 0x7fffffffffffffffL;
++}
++
++int
++main (void)
++{
++ if (foo (-1) != (-0x7fffffffffffffffL - 1))
++ __builtin_abort ();
++
++ if (foo (1) != 0x7fffffffffffffffL)
++ __builtin_abort ();
++
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.dg/asr_div1.c
++++ b/src/gcc/testsuite/gcc.dg/asr_div1.c
+@@ -1,6 +1,7 @@
+ /* Test division by const int generates only one shift. */
+ /* { dg-do run } */
+ /* { dg-options "-O2 -fdump-rtl-combine-all" } */
++/* { dg-options "-O2 -fdump-rtl-combine-all -mtune=cortex-a53" { target aarch64*-*-* } } */
+
+ extern void abort (void);
+
+--- a/src/gcc/testsuite/gcc.dg/cpp/warn-undef-2.c
++++ b/src/gcc/testsuite/gcc.dg/cpp/warn-undef-2.c
+@@ -1,5 +1,5 @@
+ // { dg-do preprocess }
+ // { dg-options "-std=gnu99 -fdiagnostics-show-option -Werror=undef" }
+ /* { dg-message "some warnings being treated as errors" "" {target "*-*-*"} 0 } */
+-#if x // { dg-error "\"x\" is not defined .-Werror=undef." }
++#if x // { dg-error "\"x\" is not defined, evaluates to 0 .-Werror=undef." }
+ #endif
+--- a/src/gcc/testsuite/gcc.dg/cpp/warn-undef.c
++++ b/src/gcc/testsuite/gcc.dg/cpp/warn-undef.c
+@@ -1,5 +1,5 @@
+ // { dg-do preprocess }
+ // { dg-options "-std=gnu99 -fdiagnostics-show-option -Wundef" }
+
+-#if x // { dg-warning "\"x\" is not defined .-Wundef." }
++#if x // { dg-warning "\"x\" is not defined, evaluates to 0 .-Wundef." }
+ #endif
+--- a/src/gcc/testsuite/gcc.dg/plugin/plugin.exp
++++ b/src/gcc/testsuite/gcc.dg/plugin/plugin.exp
+@@ -87,6 +87,12 @@ foreach plugin_test $plugin_test_list {
+ if ![runtest_file_p $runtests $plugin_src] then {
+ continue
+ }
++ # Skip tail call tests on targets that do not have sibcall_epilogue.
++ if {[regexp ".*must_tail_call_plugin.c" $plugin_src]
++ && [istarget arm*-*-*]
++ && [check_effective_target_arm_thumb1]} then {
++ continue
++ }
+ set plugin_input_tests [lreplace $plugin_test 0 0]
+ plugin-test-execute $plugin_src $plugin_input_tests
}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/pr59833.c
+@@ -0,0 +1,18 @@
++/* { dg-do run { target { *-*-linux* *-*-gnu* } } } */
++/* { dg-options "-O0 -lm" } */
++/* { dg-require-effective-target issignaling } */
++
++#define _GNU_SOURCE
++#include <math.h>
++
++int main (void)
++{
++ float sNaN = __builtin_nansf ("");
++ double x = (double) sNaN;
++ if (issignaling(x))
++ {
++ __builtin_abort();
++ }
++
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/pr68217.c
+@@ -0,0 +1,14 @@
++
++/* { dg-do compile } */
++/* { dg-options "-O2 -fdump-tree-vrp1" } */
++
++int foo (void)
++{
++ volatile int a = -1;
++ long long b = (1LL << (sizeof (b) * 8 - 1)); // LLONG_MIN
++ long long x = (a & b); // x == 0x8000000000000000
++ if (x < 1LL) { ; } else { __builtin_abort(); }
++ return 0;
++}
++
++/* { dg-final { scan-tree-dump "\\\[-INF, 0\\\]" "vrp1" } } */
+--- a/src/gcc/testsuite/gcc.dg/torture/arm-fp16-int-convert-alt.c
++++ b/src/gcc/testsuite/gcc.dg/torture/arm-fp16-int-convert-alt.c
+@@ -1,5 +1,6 @@
+ /* Test floating-point conversions. Standard types and __fp16. */
+ /* { dg-do run { target arm*-*-* } } */
++/* { dg-require-effective-target arm_fp16_alternative_ok }
+ /* { dg-options "-mfp16-format=alternative" } */
+
+ #include "fp-int-convert.h"
+--- a/src/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c
++++ b/src/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c
+@@ -1,5 +1,6 @@
+ /* Test various operators on __fp16 and mixed __fp16/float operands. */
+ /* { dg-do run { target arm*-*-* } } */
++/* { dg-require-effective-target arm_fp16_alternative_ok }
+ /* { dg-options "-mfp16-format=alternative" } */
+
+ #include "arm-fp16-ops.h"
+--- a/src/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c
++++ b/src/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c
+@@ -1,5 +1,6 @@
+ /* Test various operators on __fp16 and mixed __fp16/float operands. */
+ /* { dg-do run { target arm*-*-* } } */
++/* { dg-require-effective-target arm_fp16_alternative_ok }
+ /* { dg-options "-mfp16-format=alternative -ffast-math" } */
+
+ #include "arm-fp16-ops.h"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/torture/pr71594.c
+@@ -0,0 +1,15 @@
++/* { dg-do compile } */
++/* { dg-options "--param max-rtl-if-conversion-insns=2" } */
++
++unsigned short a;
++int b, c;
++int *d;
++void fn1() {
++ *d = 24;
++ for (; *d <= 65;) {
++ unsigned short *e = &a;
++ b = (a &= 0 <= 0) < (c ?: (*e %= *d));
++ for (; *d <= 83;)
++ ;
++ }
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
+@@ -0,0 +1,44 @@
++/* PR tree-optimization/61839. */
++/* { dg-do run } */
++/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
++/* { dg-require-effective-target int32plus } */
++
++__attribute__ ((noinline))
++int foo ()
++{
++ int a = -1;
++ volatile unsigned b = 1U;
++ int c = 1;
++ c = (a + 972195718) >> (1LU <= b);
++ if (c == 486097858)
++ ;
++ else
++ __builtin_abort ();
++ return 0;
++}
++
++__attribute__ ((noinline))
++int bar ()
++{
++ int a = -1;
++ volatile unsigned b = 1U;
++ int c = 1;
++ c = (a + 972195718) >> (b ? 2 : 3);
++ if (c == 243048929)
++ ;
++ else
++ __builtin_abort ();
++ return 0;
++}
++
++int main ()
++{
++ foo ();
++ bar ();
++}
++
++/* Scan for c = 972195717) >> [0, 1] in function foo. */
++/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1 "vrp1" } } */
++/* Scan for c = 972195717) >> [2, 3] in function bar. */
++/* { dg-final { scan-tree-dump-times "243048929 : 121524464" 2 "vrp1" } } */
++/* { dg-final { scan-tree-dump-times "486097858" 0 "optimized" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c
+@@ -0,0 +1,54 @@
++/* PR tree-optimization/61839. */
++/* { dg-do compile } */
++/* { dg-options "-O2 -fdump-tree-vrp1" } */
++/* { dg-require-effective-target int32plus } */
++
++__attribute__ ((noinline))
++int foo ()
++{
++ int a = -1;
++ volatile unsigned b = 1U;
++ int c = 1;
++ c = (a + 972195718) / (b ? 1 : 0);
++ if (c == 972195717)
++ ;
++ else
++ __builtin_abort ();
++ return 0;
++}
++
++__attribute__ ((noinline))
++int bar ()
++{
++ int a = -1;
++ volatile unsigned b = 1U;
++ int c = 1;
++ c = (a + 972195718) % (b ? 1 : 0);
++ if (c == 972195717)
++ ;
++ else
++ __builtin_abort ();
++ return 0;
++}
++
++__attribute__ ((noinline))
++int bar2 ()
++{
++ int a = -1;
++ volatile unsigned b = 1U;
++ int c = 1;
++ c = (a + 972195716) % (b ? 1 : 2);
++ if (c == 972195715)
++ ;
++ else
++ __builtin_abort ();
++ return 0;
++}
++
++
++/* Dont optimize 972195717 / 0 in function foo. */
++/* { dg-final { scan-tree-dump-times "972195717 / _" 1 "vrp1" } } */
++/* Dont optimize 972195717 % 0 in function bar. */
++/* { dg-final { scan-tree-dump-times "972195717 % _" 1 "vrp1" } } */
++/* Optimize in function bar2. */
++/* { dg-final { scan-tree-dump-times "972195715 % _" 0 "vrp1" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
+@@ -0,0 +1,26 @@
++/* PR tree-optimization/61839. */
++/* { dg-do run } */
++/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
++
++__attribute__ ((noinline))
++int foo (int a, unsigned b)
++{
++ int c = 1;
++ b = a ? 12 : 13;
++ c = b << 8;
++ if (c == 3072)
++ ;
++ else
++ __builtin_abort ();
++ return 0;
++}
++
++int main ()
++{
++ volatile unsigned b = 1U;
++ foo (-1, b);
++}
++
++/* Scan for c [12, 13] << 8 in function foo. */
++/* { dg-final { scan-tree-dump-times "3072 : 3328" 2 "vrp1" } } */
++/* { dg-final { scan-tree-dump-times "3072" 0 "optimized" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/pr61839_4.c
+@@ -0,0 +1,28 @@
++/* PR tree-optimization/61839. */
++/* { dg-do run } */
++/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
++/* { dg-require-effective-target int32plus } */
++
++__attribute__ ((noinline))
++int foo (int a, unsigned b)
++{
++ unsigned c = 1;
++ if (b >= 1 && b <= ((unsigned)(-1) - 1))
++ return 0;
++ c = b >> 4;
++ if (c == 268435455)
++ ;
++ else
++ __builtin_abort ();
++ return 0;
++}
++
++int main ()
++{
++ volatile unsigned b = (unsigned)(-1);
++ foo (-1, b);
++}
++
++/* Scan for ~[1, 4294967294] >> 4 in function foo. */
++/* { dg-final { scan-tree-dump-times "0 : 268435455" 1 "vrp1" } } */
++/* { dg-final { scan-tree-dump-times "268435455" 0 "optimized" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/scev-11.c
+@@ -0,0 +1,28 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
++
++int a[128];
++extern int b[];
++
++int bar (int *);
++
++int
++foo (int n)
++{
++ int i;
++
++ for (i = 0; i < n; i++)
++ {
++ unsigned char uc = (unsigned char)i;
++ a[i] = i;
++ b[uc] = 0;
++ }
++
++ bar (a);
++ return 0;
++}
++
++/* Address of array reference to b is scev. */
++/* { dg-final { scan-tree-dump-times "use \[0-9\]\n address" 2 "ivopts" } } */
++
++
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/scev-12.c
+@@ -0,0 +1,30 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
++
++int a[128];
++extern int b[];
++
++int bar (int *);
++
++int
++foo (int x, int n)
++{
++ int i;
++
++ for (i = 0; i < n; i++)
++ {
++ unsigned char uc = (unsigned char)i;
++ if (x)
++ a[i] = i;
++ b[uc] = 0;
++ }
++
++ bar (a);
++ return 0;
++}
++
++/* Address of array reference to b is not scev. */
++/* { dg-final { scan-tree-dump-times "use \[0-9\]\n address" 1 "ivopts" } } */
++
++
++
+--- a/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c
+@@ -25,6 +25,7 @@ f1 (int i, ...)
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -45,6 +46,7 @@ f2 (int i, ...)
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 8 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 1 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 8 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -60,6 +62,7 @@ f3 (int i, ...)
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and 1 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and 16 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 8 GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[1-9\]\[0-9\]* GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[1-9\]\[0-9\]* GPR units" "stdarg" { target ia64-*-* } } } */
+@@ -78,6 +81,7 @@ f4 (int i, ...)
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -96,6 +100,7 @@ f5 (int i, ...)
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -116,6 +121,7 @@ f6 (int i, ...)
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save (3|12|24) GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 24 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 3 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 24 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -133,6 +139,7 @@ f7 (int i, ...)
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -152,6 +159,7 @@ f8 (int i, ...)
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -169,6 +177,7 @@ f9 (int i, ...)
+ /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -188,6 +197,7 @@ f10 (int i, ...)
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -208,6 +218,7 @@ f11 (int i, ...)
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save (3|12|24) GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save 24 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save 3 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save 24 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -228,6 +239,7 @@ f12 (int i, ...)
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save 24 GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save 0 GPR units and 3 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save 0 GPR units and 48 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -248,6 +260,7 @@ f13 (int i, ...)
+ /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
+ /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save 24 GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save 0 GPR units and 3 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save 0 GPR units and 48 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -268,6 +281,7 @@ f14 (int i, ...)
+ /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save \[148\] GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
+ /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save 24 GPR units and 3" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save 1 GPR units and 2 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save 8 GPR units and 32 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -291,6 +305,7 @@ f15 (int i, ...)
+ /* { dg-final { scan-tree-dump "f15: va_list escapes 0, needs to save \[148\] GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
+ /* { dg-final { scan-tree-dump "f15: va_list escapes 0, needs to save \[148\] GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
+ /* { dg-final { scan-tree-dump "f15: va_list escapes 0, needs to save 1 GPR units and 2 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f15: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+
+ /* We may be able to improve upon this after fixing PR66010/PR66013. */
+ /* { dg-final { scan-tree-dump "f15: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+--- a/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-3.c
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-3.c
+@@ -24,6 +24,7 @@ f1 (int i, ...)
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -39,6 +40,7 @@ f2 (int i, ...)
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -57,6 +59,7 @@ f3 (int i, ...)
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -73,6 +76,7 @@ f4 (int i, ...)
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -89,6 +93,7 @@ f5 (int i, ...)
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -107,6 +112,7 @@ f6 (int i, ...)
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -123,6 +129,7 @@ f7 (int i, ...)
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -139,6 +146,7 @@ f8 (int i, ...)
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -155,6 +163,7 @@ f10 (int i, ...)
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -171,6 +180,7 @@ f11 (int i, ...)
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -187,6 +197,7 @@ f12 (int i, ...)
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+--- a/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-4.c
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-4.c
+@@ -27,6 +27,7 @@ f1 (int i, ...)
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -44,6 +45,7 @@ f2 (int i, ...)
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 0 GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save all GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 0 GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 0 GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -67,6 +69,7 @@ f3 (int i, ...)
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[148\] GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 8 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 1 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 8 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+@@ -88,6 +91,7 @@ f4 (int i, ...)
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 8 GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 0 GPR units and 1 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 0 GPR units and 16 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+--- a/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-5.c
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-5.c
+@@ -25,6 +25,7 @@ f1 (int i, ...)
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+
+ void
+ f2 (int i, ...)
+@@ -38,6 +39,7 @@ f2 (int i, ...)
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save all GPR units and all FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save all GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+
+ /* Here va_arg can be executed at most as many times as va_start. */
+ void
+@@ -56,6 +58,7 @@ f3 (int i, ...)
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 32 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 1 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 8 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+
+ void
+ f4 (int i, ...)
+@@ -74,6 +77,7 @@ f4 (int i, ...)
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 16 GPR units and 16 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 24 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 2 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 24 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+
+ void
+ f5 (int i, ...)
+@@ -88,6 +92,7 @@ f5 (int i, ...)
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 0, needs to save 16 GPR units and 0 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 0, needs to save 32 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f5: va_list escapes 0, needs to save (4|2) GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f5: va_list escapes 0, needs to save 16 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+
+ void
+ f6 (int i, ...)
+@@ -102,6 +107,7 @@ f6 (int i, ...)
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 8 GPR units and 32 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 32 GPR units and 3" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save (3|2) GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 8 GPR units and 32 FPR units" "stdarg" { target aarch64*-*-* } } } */
+
+ void
+ f7 (int i, ...)
+@@ -116,3 +122,4 @@ f7 (int i, ...)
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 0, needs to save 0 GPR units and 64 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 0, needs to save 32 GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "f7: va_list escapes 0, needs to save 2 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "f7: va_list escapes 0, needs to save 0 GPR units and 64 FPR units" "stdarg" { target aarch64*-*-* } } } */
+--- a/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-6.c
++++ b/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-6.c
+@@ -30,6 +30,7 @@ bar (int x, char const *y, ...)
+ /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
+ /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
+ /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
++/* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+ /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
+ /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-70.c
+@@ -0,0 +1,33 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target section_anchors } */
++/* { dg-require-effective-target vect_int } */
++
++#define N 32
++
++/* Increase alignment of struct if an array's offset is multiple of alignment of
++ vector type corresponding to it's scalar type.
++ For the below test-case:
++ offsetof(e) == 8 bytes.
++ i) For arm: let x = alignment of vector type corresponding to int,
++ x == 8 bytes.
++ Since offsetof(e) % x == 0, set DECL_ALIGN(a, b, c) to x.
++ ii) For aarch64, ppc: x == 16 bytes.
++ Since offsetof(e) % x != 0, don't increase alignment of a, b, c.
++*/
++
++static struct A {
++ int p1, p2;
++ int e[N];
++} a, b, c;
++
++int foo(void)
++{
++ for (int i = 0; i < N; i++)
++ a.e[i] = b.e[i] + c.e[i];
++
++ return a.e[0];
++}
++
++/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target aarch64*-*-* } } } */
++/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target powerpc64*-*-* } } } */
++/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 3 "increase_alignment" { target arm*-*-* } } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-71.c
+@@ -0,0 +1,25 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target section_anchors } */
++/* { dg-require-effective-target vect_int } */
++
++/* Should not increase alignment of the struct because
++ sizeof (A.e) < sizeof(corresponding vector type). */
++
++#define N 3
++
++static struct A {
++ int p1, p2;
++ int e[N];
++} a, b, c;
++
++int foo(void)
++{
++ for (int i = 0; i < N; i++)
++ a.e[i] = b.e[i] + c.e[i];
++
++ return a.e[0];
++}
++
++/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target aarch64*-*-* } } } */
++/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target powerpc64*-*-* } } } */
++/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target arm*-*-* } } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/vect/aligned-section-anchors-vect-72.c
+@@ -0,0 +1,29 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target section_anchors } */
++/* { dg-require-effective-target vect_int } */
++
++#define N 32
++
++/* Clone of section-anchors-vect-70.c having nested struct. */
++
++struct S
++{
++ int e[N];
++};
++
++static struct A {
++ int p1, p2;
++ struct S s;
++} a, b, c;
++
++int foo(void)
++{
++ for (int i = 0; i < N; i++)
++ a.s.e[i] = b.s.e[i] + c.s.e[i];
++
++ return a.s.e[0];
++}
++
++/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target aarch64*-*-* } } } */
++/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 0 "increase_alignment" { target powerpc64*-*-* } } } */
++/* { dg-final { scan-ipa-dump-times "Increasing alignment of decl" 3 "increase_alignment" { target arm*-*-* } } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/vect/pr57206.c
+@@ -0,0 +1,11 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target vect_float } */
++
++void bad0(float * d, unsigned int n)
++{
++ unsigned int i;
++ for (i=n; i>0; --i)
++ d[n-i] = 0.0;
++}
++
++/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/vect/pr65951.c
+@@ -0,0 +1,63 @@
++/* { dg-require-effective-target vect_int } */
++
++#include <stdarg.h>
++#include "tree-vect.h"
++
++#define N 512
++
++/* These multiplications should be vectorizable with additions when
++ no vector shift is available. */
++
++__attribute__ ((noinline)) void
++foo (int *arr)
++{
++ for (int i = 0; i < N; i++)
++ arr[i] *= 2;
++}
++
++__attribute__ ((noinline)) void
++foo2 (int *arr)
++{
++ for (int i = 0; i < N; i++)
++ arr[i] *= 4;
++}
++
++int
++main (void)
++{
++ check_vect ();
++ int data[N];
++ int i;
++
++ for (i = 0; i < N; i++)
++ {
++ data[i] = i;
++ __asm__ volatile ("");
++ }
++
++ foo (data);
++ for (i = 0; i < N; i++)
++ {
++ if (data[i] / 2 != i)
++ __builtin_abort ();
++ __asm__ volatile ("");
++ }
++
++ for (i = 0; i < N; i++)
++ {
++ data[i] = i;
++ __asm__ volatile ("");
++ }
++
++ foo2 (data);
++ for (i = 0; i < N; i++)
++ {
++ if (data[i] / 4 != i)
++ __builtin_abort ();
++ __asm__ volatile ("");
++ }
++
++ return 0;
++}
++
++/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/vect/pr71818.c
+@@ -0,0 +1,16 @@
++/* { dg-do compile } */
++
++char a;
++short b;
++int c, d;
++void fn1() {
++ char e = 75, g;
++ unsigned char *f = &e;
++ a = 21;
++ for (; a <= 48; a++) {
++ for (; e <= 6;)
++ ;
++ g -= e -= b || g <= c;
++ }
++ d = *f;
++}
+--- a/src/gcc/testsuite/gcc.dg/vect/vect-iv-9.c
++++ b/src/gcc/testsuite/gcc.dg/vect/vect-iv-9.c
+@@ -33,5 +33,4 @@ int main (void)
+ return 0;
+ }
+
+-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_int_mult } } } */
+-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target {! vect_int_mult } } } } */
++/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/vect/vect-load-lanes-peeling-1.c
+@@ -0,0 +1,13 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target vect_int } */
++/* { dg-require-effective-target vect_load_lanes } */
++
++void
++f (int *__restrict a, int *__restrict b)
++{
++ for (int i = 0; i < 96; ++i)
++ a[i] = b[i * 3] + b[i * 3 + 1] + b[i * 3 + 2];
++}
++
++/* { dg-final { scan-tree-dump-not "Data access with gaps" "vect" } } */
++/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
+@@ -0,0 +1,41 @@
++/* { dg-require-effective-target vect_int } */
++/* { dg-require-effective-target vect_shift } */
++
++#include <stdarg.h>
++#include "tree-vect.h"
++
++#define N 256
++
++__attribute__ ((noinline)) void
++foo (long long *arr)
++{
++ for (int i = 0; i < N; i++)
++ arr[i] *= 123;
++}
++
++int
++main (void)
++{
++ check_vect ();
++ long long data[N];
++ int i;
++
++ for (i = 0; i < N; i++)
++ {
++ data[i] = i;
++ __asm__ volatile ("");
++ }
++
++ foo (data);
++ for (i = 0; i < N; i++)
++ {
++ if (data[i] / 123 != i)
++ __builtin_abort ();
++ __asm__ volatile ("");
++ }
++
++ return 0;
++}
++
++/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect" { target aarch64*-*-* } } } */
++/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target aarch64*-*-* } } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
+@@ -0,0 +1,40 @@
++/* { dg-require-effective-target vect_int } */
++
++#include <stdarg.h>
++#include "tree-vect.h"
++
++#define N 256
++
++__attribute__ ((noinline)) void
++foo (long long *arr)
++{
++ for (int i = 0; i < N; i++)
++ arr[i] *= -19594LL;
++}
++
++int
++main (void)
++{
++ check_vect ();
++ long long data[N];
++ int i;
++
++ for (i = 0; i < N; i++)
++ {
++ data[i] = i;
++ __asm__ volatile ("");
++ }
++
++ foo (data);
++ for (i = 0; i < N; i++)
++ {
++ if (data[i] / -19594LL != i)
++ __builtin_abort ();
++ __asm__ volatile ("");
++ }
++
++ return 0;
++}
++
++/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect" { target aarch64*-*-* } } } */
++/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target aarch64*-*-* } } } */
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
+@@ -53,7 +53,10 @@ torture-init
+ set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
--__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
--vmul_n_f32 (float32x2_t a, float32_t b)
--{
-- float32x2_t result;
-- __asm__ ("fmul %0.2s,%1.2s,%2.s[0]"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
--vmul_n_s16 (int16x4_t a, int16_t b)
--{
-- int16x4_t result;
-- __asm__ ("mul %0.4h,%1.4h,%2.h[0]"
-- : "=w"(result)
-- : "w"(a), "x"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
--vmul_n_s32 (int32x2_t a, int32_t b)
--{
-- int32x2_t result;
-- __asm__ ("mul %0.2s,%1.2s,%2.s[0]"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
--vmul_n_u16 (uint16x4_t a, uint16_t b)
--{
-- uint16x4_t result;
-- __asm__ ("mul %0.4h,%1.4h,%2.h[0]"
-- : "=w"(result)
-- : "w"(a), "x"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
--vmul_n_u32 (uint32x2_t a, uint32_t b)
--{
-- uint32x2_t result;
-- __asm__ ("mul %0.2s,%1.2s,%2.s[0]"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
- #define vmull_high_lane_s16(a, b, c) \
- __extension__ \
- ({ \
-@@ -8443,227 +8093,6 @@ vmull_u32 (uint32x2_t a, uint32x2_t b)
- return result;
- }
+ # Make sure Neon flags are provided, if necessary. Use fp16 if we can.
+-if {[check_effective_target_arm_neon_fp16_ok]} then {
++# Use fp16 arithmetic operations if the hardware supports it.
++if {[check_effective_target_arm_v8_2a_fp16_neon_hw]} then {
++ set additional_flags [add_options_for_arm_v8_2a_fp16_neon ""]
++} elseif {[check_effective_target_arm_neon_fp16_ok]} then {
+ set additional_flags [add_options_for_arm_neon_fp16 ""]
+ } else {
+ set additional_flags [add_options_for_arm_neon ""]
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+@@ -16,6 +16,14 @@ extern void *memset(void *, int, size_t);
+ extern void *memcpy(void *, const void *, size_t);
+ extern size_t strlen(const char *);
+
++/* Helper macro to select FP16 tests. */
++#if (defined (__ARM_FP16_FORMAT_IEEE) \
++ || defined (__ARM_FP16_FORMAT_ALTERNATIVE))
++#define FP16_SUPPORTED (1)
++#else
++#undef FP16_SUPPORTED
++#endif
++
+ /* Various string construction helpers. */
+
+ /*
+@@ -81,7 +89,7 @@ extern size_t strlen(const char *);
+ abort(); \
+ } \
+ } \
+- fprintf(stderr, "CHECKED %s\n", MSG); \
++ fprintf(stderr, "CHECKED %s %s\n", STR(VECT_TYPE(T, W, N)), MSG); \
+ }
+
+ /* Floating-point variant. */
+@@ -110,7 +118,7 @@ extern size_t strlen(const char *);
+ abort(); \
+ } \
+ } \
+- fprintf(stderr, "CHECKED %s\n", MSG); \
++ fprintf(stderr, "CHECKED %s %s\n", STR(VECT_TYPE(T, W, N)), MSG); \
+ }
+
+ /* Clean buffer with a non-zero pattern to help diagnose buffer
+@@ -133,10 +141,16 @@ static ARRAY(result, uint, 32, 2);
+ static ARRAY(result, uint, 64, 1);
+ static ARRAY(result, poly, 8, 8);
+ static ARRAY(result, poly, 16, 4);
++#if defined (__ARM_FEATURE_CRYPTO)
++static ARRAY(result, poly, 64, 1);
++#endif
+ #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+ static ARRAY(result, float, 16, 4);
+ #endif
+ static ARRAY(result, float, 32, 2);
++#ifdef __aarch64__
++static ARRAY(result, float, 64, 1);
++#endif
+ static ARRAY(result, int, 8, 16);
+ static ARRAY(result, int, 16, 8);
+ static ARRAY(result, int, 32, 4);
+@@ -147,6 +161,9 @@ static ARRAY(result, uint, 32, 4);
+ static ARRAY(result, uint, 64, 2);
+ static ARRAY(result, poly, 8, 16);
+ static ARRAY(result, poly, 16, 8);
++#if defined (__ARM_FEATURE_CRYPTO)
++static ARRAY(result, poly, 64, 2);
++#endif
+ #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+ static ARRAY(result, float, 16, 8);
+ #endif
+@@ -169,6 +186,7 @@ extern ARRAY(expected, poly, 8, 8);
+ extern ARRAY(expected, poly, 16, 4);
+ extern ARRAY(expected, hfloat, 16, 4);
+ extern ARRAY(expected, hfloat, 32, 2);
++extern ARRAY(expected, hfloat, 64, 1);
+ extern ARRAY(expected, int, 8, 16);
+ extern ARRAY(expected, int, 16, 8);
+ extern ARRAY(expected, int, 32, 4);
+@@ -335,7 +353,8 @@ extern int VECT_VAR(expected_cumulative_sat, uint, 64, 2);
+ strlen(COMMENT) > 0 ? " " COMMENT : ""); \
+ abort(); \
+ } \
+- fprintf(stderr, "CHECKED CUMULATIVE SAT %s\n", MSG); \
++ fprintf(stderr, "CHECKED CUMULATIVE SAT %s %s\n", \
++ STR(VECT_TYPE(T, W, N)), MSG); \
+ }
+
+ #define CHECK_CUMULATIVE_SAT_NAMED(test_name,EXPECTED,comment) \
+@@ -500,15 +519,6 @@ static void clean_results (void)
+ /* Helpers to initialize vectors. */
+ #define VDUP(VAR, Q, T1, T2, W, N, V) \
+ VECT_VAR(VAR, T1, W, N) = vdup##Q##_n_##T2##W(V)
+-#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+-/* Work around that there is no vdup_n_f16 intrinsic. */
+-#define vdup_n_f16(VAL) \
+- __extension__ \
+- ({ \
+- float16_t f = VAL; \
+- vld1_dup_f16(&f); \
+- })
+-#endif
+
+ #define VSET_LANE(VAR, Q, T1, T2, W, N, L, V) \
+ VECT_VAR(VAR, T1, W, N) = vset##Q##_lane_##T2##W(V, \
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_float.inc
+@@ -0,0 +1,170 @@
++/* Floating-point only version of binary_op_no64.inc template. Currently only
++ float16_t is used. */
++
++#include <math.h>
++
++#define FNNAME1(NAME) exec_ ## NAME
++#define FNNAME(NAME) FNNAME1(NAME)
++
++void FNNAME (INSN_NAME) (void)
++{
++ int i;
++
++ /* Basic test: z = INSN (x, y), then store the result. */
++#define TEST_BINARY_OP1(INSN, Q, T1, T2, W, N) \
++ VECT_VAR(vector_res, T1, W, N) = \
++ INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N), \
++ VECT_VAR(vector2, T1, W, N)); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
++
++#define TEST_BINARY_OP(INSN, Q, T1, T2, W, N) \
++ TEST_BINARY_OP1(INSN, Q, T1, T2, W, N) \
++
++#ifdef HAS_FLOAT16_VARIANT
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector2, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 4);
++
++ DECL_VARIABLE(vector, float, 16, 8);
++ DECL_VARIABLE(vector2, float, 16, 8);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
++
++#ifdef HAS_FLOAT_VARIANT
++ DECL_VARIABLE(vector, float, 32, 2);
++ DECL_VARIABLE(vector2, float, 32, 2);
++ DECL_VARIABLE(vector_res, float, 32, 2);
++
++ DECL_VARIABLE(vector, float, 32, 4);
++ DECL_VARIABLE(vector2, float, 32, 4);
++ DECL_VARIABLE(vector_res, float, 32, 4);
++#endif
++
++ clean_results ();
++
++ /* Initialize input "vector" from "buffer". */
++#ifdef HAS_FLOAT16_VARIANT
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
++#ifdef HAS_FLOAT_VARIANT
++ VLOAD(vector, buffer, , float, f, 32, 2);
++ VLOAD(vector, buffer, q, float, f, 32, 4);
++#endif
++
++ /* Choose init value arbitrarily, will be used as comparison value. */
++#ifdef HAS_FLOAT16_VARIANT
++ VDUP(vector2, , float, f, 16, 4, -15.5f);
++ VDUP(vector2, q, float, f, 16, 8, -14.5f);
++#endif
++#ifdef HAS_FLOAT_VARIANT
++ VDUP(vector2, , float, f, 32, 2, -15.5f);
++ VDUP(vector2, q, float, f, 32, 4, -14.5f);
++#endif
++
++#ifdef HAS_FLOAT16_VARIANT
++#define FLOAT16_VARIANT(MACRO, VAR) \
++ MACRO(VAR, , float, f, 16, 4); \
++ MACRO(VAR, q, float, f, 16, 8);
++#else
++#define FLOAT16_VARIANT(MACRO, VAR)
++#endif
++
++#ifdef HAS_FLOAT_VARIANT
++#define FLOAT_VARIANT(MACRO, VAR) \
++ MACRO(VAR, , float, f, 32, 2); \
++ MACRO(VAR, q, float, f, 32, 4);
++#else
++#define FLOAT_VARIANT(MACRO, VAR)
++#endif
++
++#define TEST_MACRO_NO64BIT_VARIANT_1_5(MACRO, VAR) \
++
++ /* Apply a binary operator named INSN_NAME. */
++ FLOAT16_VARIANT(TEST_BINARY_OP, INSN_NAME);
++ FLOAT_VARIANT(TEST_BINARY_OP, INSN_NAME);
++
++#ifdef HAS_FLOAT16_VARIANT
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
++
++ /* Extra FP tests with special values (NaN, ....) */
++ VDUP(vector, q, float, f, 16, 8, 1.0f);
++ VDUP(vector2, q, float, f, 16, 8, NAN);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_nan,
++ " FP special (NaN)");
++
++ VDUP(vector, q, float, f, 16, 8, -NAN);
++ VDUP(vector2, q, float, f, 16, 8, 1.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_mnan,
++ " FP special (-NaN)");
++
++ VDUP(vector, q, float, f, 16, 8, 1.0f);
++ VDUP(vector2, q, float, f, 16, 8, HUGE_VALF);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_inf,
++ " FP special (inf)");
++
++ VDUP(vector, q, float, f, 16, 8, -HUGE_VALF);
++ VDUP(vector2, q, float, f, 16, 8, 1.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_minf,
++ " FP special (-inf)");
++
++ VDUP(vector, q, float, f, 16, 8, 0.0f);
++ VDUP(vector2, q, float, f, 16, 8, -0.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_zero1,
++ " FP special (-0.0)");
++
++ VDUP(vector, q, float, f, 16, 8, -0.0f);
++ VDUP(vector2, q, float, f, 16, 8, 0.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_zero2,
++ " FP special (-0.0)");
++#endif
++
++#ifdef HAS_FLOAT_VARIANT
++ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
++ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
++
++ /* Extra FP tests with special values (NaN, ....) */
++ VDUP(vector, q, float, f, 32, 4, 1.0f);
++ VDUP(vector2, q, float, f, 32, 4, NAN);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
++ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_nan, " FP special (NaN)");
++
++ VDUP(vector, q, float, f, 32, 4, -NAN);
++ VDUP(vector2, q, float, f, 32, 4, 1.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
++ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_mnan, " FP special (-NaN)");
++
++ VDUP(vector, q, float, f, 32, 4, 1.0f);
++ VDUP(vector2, q, float, f, 32, 4, HUGE_VALF);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
++ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_inf, " FP special (inf)");
++
++ VDUP(vector, q, float, f, 32, 4, -HUGE_VALF);
++ VDUP(vector2, q, float, f, 32, 4, 1.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
++ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_minf, " FP special (-inf)");
++
++ VDUP(vector, q, float, f, 32, 4, 0.0f);
++ VDUP(vector2, q, float, f, 32, 4, -0.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
++ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_zero1, " FP special (-0.0)");
++
++ VDUP(vector, q, float, f, 32, 4, -0.0f);
++ VDUP(vector2, q, float, f, 32, 4, 0.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4);
++ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_zero2, " FP special (-0.0)");
++#endif
++}
++
++int main (void)
++{
++ FNNAME (INSN_NAME) ();
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc
+@@ -28,6 +28,10 @@ void FNNAME (INSN_NAME) (void)
+
+ /* Initialize input "vector" from "buffer". */
+ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
++#ifdef HAS_FLOAT16_VARIANT
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
+ #ifdef HAS_FLOAT_VARIANT
+ VLOAD(vector, buffer, , float, f, 32, 2);
+ VLOAD(vector, buffer, q, float, f, 32, 4);
+@@ -46,15 +50,27 @@ void FNNAME (INSN_NAME) (void)
+ VDUP(vector2, q, uint, u, 8, 16, 0xf9);
+ VDUP(vector2, q, uint, u, 16, 8, 0xfff2);
+ VDUP(vector2, q, uint, u, 32, 4, 0xfffffff1);
++#ifdef HAS_FLOAT16_VARIANT
++ VDUP(vector2, , float, f, 16, 4, -15.5f);
++ VDUP(vector2, q, float, f, 16, 8, -14.5f);
++#endif
+ #ifdef HAS_FLOAT_VARIANT
+ VDUP(vector2, , float, f, 32, 2, -15.5f);
+ VDUP(vector2, q, float, f, 32, 4, -14.5f);
+ #endif
+
++#ifdef HAS_FLOAT16_VARIANT
++#define FLOAT16_VARIANT(MACRO, VAR) \
++ MACRO(VAR, , float, f, 16, 4); \
++ MACRO(VAR, q, float, f, 16, 8);
++#else
++#define FLOAT16_VARIANT(MACRO, VAR)
++#endif
++
+ #ifdef HAS_FLOAT_VARIANT
+ #define FLOAT_VARIANT(MACRO, VAR) \
+ MACRO(VAR, , float, f, 32, 2); \
+- MACRO(VAR, q, float, f, 32, 4)
++ MACRO(VAR, q, float, f, 32, 4);
+ #else
+ #define FLOAT_VARIANT(MACRO, VAR)
+ #endif
+@@ -72,7 +88,8 @@ void FNNAME (INSN_NAME) (void)
+ MACRO(VAR, q, uint, u, 8, 16); \
+ MACRO(VAR, q, uint, u, 16, 8); \
+ MACRO(VAR, q, uint, u, 32, 4); \
+- FLOAT_VARIANT(MACRO, VAR)
++ FLOAT_VARIANT(MACRO, VAR); \
++ FLOAT16_VARIANT(MACRO, VAR);
+
+ /* Apply a binary operator named INSN_NAME. */
+ TEST_MACRO_NO64BIT_VARIANT_1_5(TEST_BINARY_OP, INSN_NAME);
+@@ -90,6 +107,42 @@ void FNNAME (INSN_NAME) (void)
+ CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+
++#ifdef HAS_FLOAT16_VARIANT
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
++
++ /* Extra FP tests with special values (NaN, ....) */
++ VDUP(vector, q, float, f, 16, 8, 1.0f);
++ VDUP(vector2, q, float, f, 16, 8, NAN);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_nan, " FP special (NaN)");
++
++ VDUP(vector, q, float, f, 16, 8, -NAN);
++ VDUP(vector2, q, float, f, 16, 8, 1.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_mnan, " FP special (-NaN)");
++
++ VDUP(vector, q, float, f, 16, 8, 1.0f);
++ VDUP(vector2, q, float, f, 16, 8, HUGE_VALF);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_inf, " FP special (inf)");
++
++ VDUP(vector, q, float, f, 16, 8, -HUGE_VALF);
++ VDUP(vector2, q, float, f, 16, 8, 1.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_minf, " FP special (-inf)");
++
++ VDUP(vector, q, float, f, 16, 8, 0.0f);
++ VDUP(vector2, q, float, f, 16, 8, -0.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_zero1, " FP special (-0.0)");
++
++ VDUP(vector, q, float, f, 16, 8, -0.0f);
++ VDUP(vector2, q, float, f, 16, 8, 0.0f);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_zero2, " FP special (-0.0)");
++#endif
++
+ #ifdef HAS_FLOAT_VARIANT
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_scalar_op.inc
+@@ -0,0 +1,160 @@
++/* Template file for binary scalar operator validation.
++
++ This file is meant to be included by test files for binary scalar
++ operations. */
++
++/* Check for required settings. */
++
++#ifndef INSN_NAME
++#error INSN_NAME (the intrinsic to test) must be defined.
++#endif
++
++#ifndef INPUT_TYPE
++#error INPUT_TYPE (basic type of an input value) must be defined.
++#endif
++
++#ifndef OUTPUT_TYPE
++#error OUTPUT_TYPE (basic type of an output value) must be defined.
++#endif
++
++#ifndef OUTPUT_TYPE_SIZE
++#error OUTPUT_TYPE_SIZE (size in bits of an output value) must be defined.
++#endif
++
++/* Optional settings:
++
++ INPUT_1: Input values for the first parameter. Must be of type INPUT_TYPE.
++ INPUT_2: Input values for the first parameter. Must be of type
++ INPUT_TYPE. */
++
++#ifndef TEST_MSG
++#define TEST_MSG "unnamed test"
++#endif
++
++/* The test framework. */
++
++#include <stdio.h>
++
++extern void abort ();
++
++#define INFF __builtin_inf ()
++
++/* Stringify a macro. */
++#define STR0(A) #A
++#define STR(A) STR0 (A)
++
++/* Macro concatenation. */
++#define CAT0(A, B) A##B
++#define CAT(A, B) CAT0 (A, B)
++
++/* Format strings for error reporting. */
++#define FMT16 "0x%04x"
++#define FMT32 "0x%08x"
++#define FMT CAT (FMT,OUTPUT_TYPE_SIZE)
++
++/* Type construction: forms TS_t, where T is the base type and S the size in
++ bits. */
++#define MK_TYPE0(T, S) T##S##_t
++#define MK_TYPE(T, S) MK_TYPE0 (T, S)
++
++/* Convenience types for input and output data. */
++typedef MK_TYPE (uint, OUTPUT_TYPE_SIZE) output_hex_type;
++
++/* Conversion between typed values and their hexadecimal representation. */
++typedef union
++{
++ OUTPUT_TYPE value;
++ output_hex_type hex;
++} output_conv_type;
++
++/* Default input values. */
++
++float16_t input_1_float16_t[] =
++{
++ 0.0, -0.0,
++ 2.0, 3.1,
++ 20.0, 0.40,
++ -2.3, 1.33,
++ -7.6, 0.31,
++ 0.3353, 0.5,
++ 1.0, 13.13,
++ -6.3, 20.0,
++ (float16_t)INFF, (float16_t)-INFF,
++};
++
++float16_t input_2_float16_t[] =
++{
++ 1.0, 1.0,
++ -4.33, 100.0,
++ 30.0, -0.02,
++ 0.5, -7.231,
++ -6.3, 20.0,
++ -7.231, 2.3,
++ -7.6, 5.1,
++ 0.31, 0.33353,
++ (float16_t)-INFF, (float16_t)INFF,
++};
++
++#ifndef INPUT_1
++#define INPUT_1 CAT (input_1_,INPUT_TYPE)
++#endif
++
++#ifndef INPUT_2
++#define INPUT_2 CAT (input_2_,INPUT_TYPE)
++#endif
++
++/* Support macros and routines for the test function. */
++
++#define CHECK() \
++ { \
++ output_conv_type actual; \
++ output_conv_type expect; \
++ \
++ expect.hex = ((output_hex_type*)EXPECTED)[index]; \
++ actual.value = INSN_NAME ((INPUT_1)[index], \
++ (INPUT_2)[index]); \
++ \
++ if (actual.hex != expect.hex) \
++ { \
++ fprintf (stderr, \
++ "ERROR in %s (%s line %d), buffer %s, " \
++ "index %d: got " \
++ FMT " != " FMT "\n", \
++ TEST_MSG, __FILE__, __LINE__, \
++ STR (EXPECTED), index, \
++ actual.hex, expect.hex); \
++ abort (); \
++ } \
++ fprintf (stderr, "CHECKED %s %s\n", \
++ STR (EXPECTED), TEST_MSG); \
++ }
++
++#define FNNAME1(NAME) exec_ ## NAME
++#define FNNAME(NAME) FNNAME1 (NAME)
++
++/* The test function. */
++
++void
++FNNAME (INSN_NAME) (void)
++{
++ /* Basic test: y[i] = OP (x[i]), for each INPUT[i], then compare the result
++ against EXPECTED[i]. */
++
++ const int num_tests = sizeof (INPUT_1) / sizeof (INPUT_1[0]);
++ int index;
++
++ for (index = 0; index < num_tests; index++)
++ CHECK ();
++
++#ifdef EXTRA_TESTS
++ EXTRA_TESTS ();
++#endif
++}
++
++int
++main (void)
++{
++ FNNAME (INSN_NAME) ();
++
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_fp_op.inc
+@@ -15,6 +15,10 @@
+ each test file. */
+ extern ARRAY(expected2, uint, 32, 2);
+ extern ARRAY(expected2, uint, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++extern ARRAY(expected2, uint, 16, 4);
++extern ARRAY(expected2, uint, 16, 8);
++#endif
--__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
--vmulq_n_f32 (float32x4_t a, float32_t b)
--{
-- float32x4_t result;
-- __asm__ ("fmul %0.4s,%1.4s,%2.s[0]"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
--vmulq_n_f64 (float64x2_t a, float64_t b)
--{
-- float64x2_t result;
-- __asm__ ("fmul %0.2d,%1.2d,%2.d[0]"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
--vmulq_n_s16 (int16x8_t a, int16_t b)
--{
-- int16x8_t result;
-- __asm__ ("mul %0.8h,%1.8h,%2.h[0]"
-- : "=w"(result)
-- : "w"(a), "x"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
--vmulq_n_s32 (int32x4_t a, int32_t b)
--{
-- int32x4_t result;
-- __asm__ ("mul %0.4s,%1.4s,%2.s[0]"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
--vmulq_n_u16 (uint16x8_t a, uint16_t b)
--{
-- uint16x8_t result;
-- __asm__ ("mul %0.8h,%1.8h,%2.h[0]"
-- : "=w"(result)
-- : "w"(a), "x"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
--vmulq_n_u32 (uint32x4_t a, uint32_t b)
--{
-- uint32x4_t result;
-- __asm__ ("mul %0.4s,%1.4s,%2.s[0]"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
--vmvn_p8 (poly8x8_t a)
--{
-- poly8x8_t result;
-- __asm__ ("mvn %0.8b,%1.8b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
--vmvn_s8 (int8x8_t a)
--{
-- int8x8_t result;
-- __asm__ ("mvn %0.8b,%1.8b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
--vmvn_s16 (int16x4_t a)
--{
-- int16x4_t result;
-- __asm__ ("mvn %0.8b,%1.8b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
--vmvn_s32 (int32x2_t a)
--{
-- int32x2_t result;
-- __asm__ ("mvn %0.8b,%1.8b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
--vmvn_u8 (uint8x8_t a)
--{
-- uint8x8_t result;
-- __asm__ ("mvn %0.8b,%1.8b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
--vmvn_u16 (uint16x4_t a)
--{
-- uint16x4_t result;
-- __asm__ ("mvn %0.8b,%1.8b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
--vmvn_u32 (uint32x2_t a)
--{
-- uint32x2_t result;
-- __asm__ ("mvn %0.8b,%1.8b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
--vmvnq_p8 (poly8x16_t a)
--{
-- poly8x16_t result;
-- __asm__ ("mvn %0.16b,%1.16b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
--vmvnq_s8 (int8x16_t a)
--{
-- int8x16_t result;
-- __asm__ ("mvn %0.16b,%1.16b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
--vmvnq_s16 (int16x8_t a)
--{
-- int16x8_t result;
-- __asm__ ("mvn %0.16b,%1.16b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
--vmvnq_s32 (int32x4_t a)
--{
-- int32x4_t result;
-- __asm__ ("mvn %0.16b,%1.16b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
--vmvnq_u8 (uint8x16_t a)
--{
-- uint8x16_t result;
-- __asm__ ("mvn %0.16b,%1.16b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
--vmvnq_u16 (uint16x8_t a)
--{
-- uint16x8_t result;
-- __asm__ ("mvn %0.16b,%1.16b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
--vmvnq_u32 (uint32x4_t a)
--{
-- uint32x4_t result;
-- __asm__ ("mvn %0.16b,%1.16b"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
+ #define FNNAME1(NAME) exec_ ## NAME
+ #define FNNAME(NAME) FNNAME1(NAME)
+@@ -37,17 +41,33 @@ void FNNAME (INSN_NAME) (void)
+ DECL_VARIABLE(vector2, float, 32, 4);
+ DECL_VARIABLE(vector_res, uint, 32, 2);
+ DECL_VARIABLE(vector_res, uint, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
++ DECL_VARIABLE(vector2, float, 16, 4);
++ DECL_VARIABLE(vector2, float, 16, 8);
++ DECL_VARIABLE(vector_res, uint, 16, 4);
++ DECL_VARIABLE(vector_res, uint, 16, 8);
++#endif
+
+ clean_results ();
+
+ /* Initialize input "vector" from "buffer". */
+ VLOAD(vector, buffer, , float, f, 32, 2);
+ VLOAD(vector, buffer, q, float, f, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
+
+ /* Choose init value arbitrarily, will be used for vector
+ comparison. */
+ VDUP(vector2, , float, f, 32, 2, -16.0f);
+ VDUP(vector2, q, float, f, 32, 4, -14.0f);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector2, , float, f, 16, 4, -16.0f);
++ VDUP(vector2, q, float, f, 16, 8, -14.0f);
++#endif
+
+ /* Apply operator named INSN_NAME. */
+ TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2);
+@@ -56,15 +76,36 @@ void FNNAME (INSN_NAME) (void)
+ TEST_VCOMP(INSN_NAME, q, float, f, uint, 32, 4);
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VCOMP(INSN_NAME, , float, f, uint, 16, 4);
++ CHECK(TEST_MSG, uint, 16, 4, PRIx16, expected, "");
++
++ TEST_VCOMP(INSN_NAME, q, float, f, uint, 16, 8);
++ CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected, "");
++#endif
++
+ /* Test again, with different input values. */
+ VDUP(vector2, , float, f, 32, 2, -10.0f);
+ VDUP(vector2, q, float, f, 32, 4, 10.0f);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector2, , float, f, 16, 4, -10.0f);
++ VDUP(vector2, q, float, f, 16, 8, 10.0f);
++#endif
++
+ TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2);
+ CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected2, "");
+
+ TEST_VCOMP(INSN_NAME, q, float, f, uint, 32, 4);
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected2,"");
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VCOMP(INSN_NAME, , float, f, uint, 16, 4);
++ CHECK(TEST_MSG, uint, 16, 4, PRIx16, expected2, "");
++
++ TEST_VCOMP(INSN_NAME, q, float, f, uint, 16, 8);
++ CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected2,"");
++#endif
+ }
+
+ int main (void)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_op.inc
+@@ -11,6 +11,17 @@ extern ARRAY(expected_uint, uint, 32, 2);
+ extern ARRAY(expected_q_uint, uint, 8, 16);
+ extern ARRAY(expected_q_uint, uint, 16, 8);
+ extern ARRAY(expected_q_uint, uint, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++extern ARRAY(expected_float, uint, 16, 4);
++extern ARRAY(expected_q_float, uint, 16, 8);
++extern ARRAY(expected_nan, uint, 16, 4);
++extern ARRAY(expected_mnan, uint, 16, 4);
++extern ARRAY(expected_nan2, uint, 16, 4);
++extern ARRAY(expected_inf, uint, 16, 4);
++extern ARRAY(expected_minf, uint, 16, 4);
++extern ARRAY(expected_inf2, uint, 16, 4);
++extern ARRAY(expected_mzero, uint, 16, 4);
++#endif
+ extern ARRAY(expected_float, uint, 32, 2);
+ extern ARRAY(expected_q_float, uint, 32, 4);
+ extern ARRAY(expected_uint2, uint, 32, 2);
+@@ -48,6 +59,9 @@ void FNNAME (INSN_NAME) (void)
+ DECL_VARIABLE(vector, uint, 8, 8);
+ DECL_VARIABLE(vector, uint, 16, 4);
+ DECL_VARIABLE(vector, uint, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE (vector, float, 16, 4);
++#endif
+ DECL_VARIABLE(vector, float, 32, 2);
+ DECL_VARIABLE(vector, int, 8, 16);
+ DECL_VARIABLE(vector, int, 16, 8);
+@@ -55,6 +69,9 @@ void FNNAME (INSN_NAME) (void)
+ DECL_VARIABLE(vector, uint, 8, 16);
+ DECL_VARIABLE(vector, uint, 16, 8);
+ DECL_VARIABLE(vector, uint, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE (vector, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector, float, 32, 4);
+
+ DECL_VARIABLE(vector2, int, 8, 8);
+@@ -63,6 +80,9 @@ void FNNAME (INSN_NAME) (void)
+ DECL_VARIABLE(vector2, uint, 8, 8);
+ DECL_VARIABLE(vector2, uint, 16, 4);
+ DECL_VARIABLE(vector2, uint, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE (vector2, float, 16, 4);
++#endif
+ DECL_VARIABLE(vector2, float, 32, 2);
+ DECL_VARIABLE(vector2, int, 8, 16);
+ DECL_VARIABLE(vector2, int, 16, 8);
+@@ -70,6 +90,9 @@ void FNNAME (INSN_NAME) (void)
+ DECL_VARIABLE(vector2, uint, 8, 16);
+ DECL_VARIABLE(vector2, uint, 16, 8);
+ DECL_VARIABLE(vector2, uint, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE (vector2, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector2, float, 32, 4);
+
+ DECL_VARIABLE(vector_res, uint, 8, 8);
+@@ -88,6 +111,9 @@ void FNNAME (INSN_NAME) (void)
+ VLOAD(vector, buffer, , uint, u, 8, 8);
+ VLOAD(vector, buffer, , uint, u, 16, 4);
+ VLOAD(vector, buffer, , uint, u, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD (vector, buffer, , float, f, 16, 4);
++#endif
+ VLOAD(vector, buffer, , float, f, 32, 2);
+
+ VLOAD(vector, buffer, q, int, s, 8, 16);
+@@ -96,6 +122,9 @@ void FNNAME (INSN_NAME) (void)
+ VLOAD(vector, buffer, q, uint, u, 8, 16);
+ VLOAD(vector, buffer, q, uint, u, 16, 8);
+ VLOAD(vector, buffer, q, uint, u, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD (vector, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector, buffer, q, float, f, 32, 4);
+
+ /* Choose init value arbitrarily, will be used for vector
+@@ -106,6 +135,9 @@ void FNNAME (INSN_NAME) (void)
+ VDUP(vector2, , uint, u, 8, 8, 0xF3);
+ VDUP(vector2, , uint, u, 16, 4, 0xFFF2);
+ VDUP(vector2, , uint, u, 32, 2, 0xFFFFFFF1);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP (vector2, , float, f, 16, 4, -15.0f);
++#endif
+ VDUP(vector2, , float, f, 32, 2, -15.0f);
+
+ VDUP(vector2, q, int, s, 8, 16, -4);
+@@ -114,6 +146,9 @@ void FNNAME (INSN_NAME) (void)
+ VDUP(vector2, q, uint, u, 8, 16, 0xF4);
+ VDUP(vector2, q, uint, u, 16, 8, 0xFFF6);
+ VDUP(vector2, q, uint, u, 32, 4, 0xFFFFFFF2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP (vector2, q, float, f, 16, 8, -14.0f);
++#endif
+ VDUP(vector2, q, float, f, 32, 4, -14.0f);
+
+ /* The comparison operators produce only unsigned results, which
+@@ -154,9 +189,17 @@ void FNNAME (INSN_NAME) (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_q_uint, "");
+
+ /* The float variants. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_float, "");
++#endif
+ TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2);
+ CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected_float, "");
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VCOMP (INSN_NAME, q, float, f, uint, 16, 8);
++ CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected_q_float, "");
++#endif
+ TEST_VCOMP(INSN_NAME, q, float, f, uint, 32, 4);
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_q_float, "");
+
+@@ -176,6 +219,43 @@ void FNNAME (INSN_NAME) (void)
+
+
+ /* Extra FP tests with special values (NaN, ....). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP (vector, , float, f, 16, 4, 1.0);
++ VDUP (vector2, , float, f, 16, 4, NAN);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_nan, "FP special (NaN)");
++
++ VDUP (vector, , float, f, 16, 4, 1.0);
++ VDUP (vector2, , float, f, 16, 4, -NAN);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_mnan, " FP special (-NaN)");
++
++ VDUP (vector, , float, f, 16, 4, NAN);
++ VDUP (vector2, , float, f, 16, 4, 1.0);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_nan2, " FP special (NaN)");
++
++ VDUP (vector, , float, f, 16, 4, 1.0);
++ VDUP (vector2, , float, f, 16, 4, HUGE_VALF);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_inf, " FP special (inf)");
++
++ VDUP (vector, , float, f, 16, 4, 1.0);
++ VDUP (vector2, , float, f, 16, 4, -HUGE_VALF);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_minf, " FP special (-inf)");
++
++ VDUP (vector, , float, f, 16, 4, HUGE_VALF);
++ VDUP (vector2, , float, f, 16, 4, 1.0);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_inf2, " FP special (inf)");
++
++ VDUP (vector, , float, f, 16, 4, -0.0);
++ VDUP (vector2, , float, f, 16, 4, 0.0);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_mzero, " FP special (-0.0)");
++#endif
++
+ VDUP(vector, , float, f, 32, 2, 1.0);
+ VDUP(vector2, , float, f, 32, 2, NAN);
+ TEST_VCOMP(INSN_NAME, , float, f, uint, 32, 2);
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/cmp_zero_op.inc
+@@ -0,0 +1,111 @@
++/* Template file for the validation of compare against zero operators.
++
++ This file is base on cmp_op.inc. It is meant to be included by the relevant
++ test files, which have to define the intrinsic family to test. If a given
++ intrinsic supports variants which are not supported by all the other
++ operators, these can be tested by providing a definition for EXTRA_TESTS. */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++#include <math.h>
++
++/* Additional expected results declaration, they are initialized in
++ each test file. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++extern ARRAY(expected_float, uint, 16, 4);
++extern ARRAY(expected_q_float, uint, 16, 8);
++extern ARRAY(expected_uint2, uint, 16, 4);
++extern ARRAY(expected_uint3, uint, 16, 4);
++extern ARRAY(expected_uint4, uint, 16, 4);
++extern ARRAY(expected_nan, uint, 16, 4);
++extern ARRAY(expected_mnan, uint, 16, 4);
++extern ARRAY(expected_inf, uint, 16, 4);
++extern ARRAY(expected_minf, uint, 16, 4);
++extern ARRAY(expected_zero, uint, 16, 4);
++extern ARRAY(expected_mzero, uint, 16, 4);
++#endif
++
++#define FNNAME1(NAME) exec_ ## NAME
++#define FNNAME(NAME) FNNAME1(NAME)
++
++void FNNAME (INSN_NAME) (void)
++{
++ /* Basic test: y=vcomp(x1,x2), then store the result. */
++#define TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N) \
++ VECT_VAR(vector_res, T3, W, N) = \
++ INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N)); \
++ vst1##Q##_u##W(VECT_VAR(result, T3, W, N), VECT_VAR(vector_res, T3, W, N))
++
++#define TEST_VCOMP(INSN, Q, T1, T2, T3, W, N) \
++ TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N)
++
++ /* No need for 64 bits elements. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE (vector, float, 16, 4);
++ DECL_VARIABLE (vector, float, 16, 8);
++#endif
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector_res, uint, 16, 4);
++ DECL_VARIABLE(vector_res, uint, 16, 8);
++#endif
++
++ clean_results ();
++
++ /* Choose init value arbitrarily, will be used for vector
++ comparison. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP (vector, , float, f, 16, 4, -15.0f);
++ VDUP (vector, q, float, f, 16, 8, 14.0f);
++#endif
++
++ /* Float variants. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ TEST_VCOMP (INSN_NAME, q, float, f, uint, 16, 8);
++#endif
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_float, "");
++ CHECK (TEST_MSG, uint, 16, 8, PRIx16, expected_q_float, "");
++#endif
++
++ /* Extra FP tests with special values (NaN, ....). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP (vector, , float, f, 16, 4, NAN);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_nan, "FP special (NaN)");
++
++ VDUP (vector, , float, f, 16, 4, -NAN);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_mnan, " FP special (-NaN)");
++
++ VDUP (vector, , float, f, 16, 4, HUGE_VALF);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_inf, " FP special (inf)");
++
++ VDUP (vector, , float, f, 16, 4, -HUGE_VALF);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_minf, " FP special (-inf)");
++
++ VDUP (vector, , float, f, 16, 4, 0.0);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_zero, " FP special (0.0)");
++
++ VDUP (vector, , float, f, 16, 4, 0.0);
++ TEST_VCOMP (INSN_NAME, , float, f, uint, 16, 4);
++ CHECK (TEST_MSG, uint, 16, 4, PRIx16, expected_mzero, " FP special (-0.0)");
++#endif
++
++#ifdef EXTRA_TESTS
++ EXTRA_TESTS();
++#endif
++}
++
++int main (void)
++{
++ FNNAME (INSN_NAME) ();
++
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
+@@ -118,6 +118,10 @@ VECT_VAR_DECL_INIT(buffer, uint, 32, 2);
+ PAD(buffer_pad, uint, 32, 2);
+ VECT_VAR_DECL_INIT(buffer, uint, 64, 1);
+ PAD(buffer_pad, uint, 64, 1);
++#if defined (__ARM_FEATURE_CRYPTO)
++VECT_VAR_DECL_INIT(buffer, poly, 64, 1);
++PAD(buffer_pad, poly, 64, 1);
++#endif
+ #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+ VECT_VAR_DECL_INIT(buffer, float, 16, 4);
+ PAD(buffer_pad, float, 16, 4);
+@@ -144,6 +148,10 @@ VECT_VAR_DECL_INIT(buffer, poly, 8, 16);
+ PAD(buffer_pad, poly, 8, 16);
+ VECT_VAR_DECL_INIT(buffer, poly, 16, 8);
+ PAD(buffer_pad, poly, 16, 8);
++#if defined (__ARM_FEATURE_CRYPTO)
++VECT_VAR_DECL_INIT(buffer, poly, 64, 2);
++PAD(buffer_pad, poly, 64, 2);
++#endif
+ #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+ VECT_VAR_DECL_INIT(buffer, float, 16, 8);
+ PAD(buffer_pad, float, 16, 8);
+@@ -178,6 +186,10 @@ VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 8);
+ VECT_VAR_DECL(buffer_dup_pad, poly, 8, 8);
+ VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 4);
+ VECT_VAR_DECL(buffer_dup_pad, poly, 16, 4);
++#if defined (__ARM_FEATURE_CRYPTO)
++VECT_VAR_DECL_INIT4(buffer_dup, poly, 64, 1);
++VECT_VAR_DECL(buffer_dup_pad, poly, 64, 1);
++#endif
+ #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+ VECT_VAR_DECL_INIT4(buffer_dup, float, 16, 4);
+ VECT_VAR_DECL(buffer_dup_pad, float, 16, 4);
+@@ -205,6 +217,10 @@ VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 16);
+ VECT_VAR_DECL(buffer_dup_pad, poly, 8, 16);
+ VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 8);
+ VECT_VAR_DECL(buffer_dup_pad, poly, 16, 8);
++#if defined (__ARM_FEATURE_CRYPTO)
++VECT_VAR_DECL_INIT4(buffer_dup, poly, 64, 2);
++VECT_VAR_DECL(buffer_dup_pad, poly, 64, 2);
++#endif
+ #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+ VECT_VAR_DECL_INIT(buffer_dup, float, 16, 8);
+ VECT_VAR_DECL(buffer_dup_pad, float, 16, 8);
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
+@@ -0,0 +1,663 @@
++/* This file contains tests for all the *p64 intrinsics, except for
++ vreinterpret which have their own testcase. */
++
++/* { dg-require-effective-target arm_crypto_ok } */
++/* { dg-add-options arm_crypto } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++/* Expected results: vbsl. */
++VECT_VAR_DECL(vbsl_expected,poly,64,1) [] = { 0xfffffff1 };
++VECT_VAR_DECL(vbsl_expected,poly,64,2) [] = { 0xfffffff1,
++ 0xfffffff1 };
++
++/* Expected results: vceq. */
++VECT_VAR_DECL(vceq_expected,uint,64,1) [] = { 0x0 };
++
++/* Expected results: vcombine. */
++VECT_VAR_DECL(vcombine_expected,poly,64,2) [] = { 0xfffffffffffffff0, 0x88 };
++
++/* Expected results: vcreate. */
++VECT_VAR_DECL(vcreate_expected,poly,64,1) [] = { 0x123456789abcdef0 };
++
++/* Expected results: vdup_lane. */
++VECT_VAR_DECL(vdup_lane_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vdup_lane_expected,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff0 };
++
++/* Expected results: vdup_n. */
++VECT_VAR_DECL(vdup_n_expected0,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vdup_n_expected0,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff0 };
++VECT_VAR_DECL(vdup_n_expected1,poly,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(vdup_n_expected1,poly,64,2) [] = { 0xfffffffffffffff1,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vdup_n_expected2,poly,64,1) [] = { 0xfffffffffffffff2 };
++VECT_VAR_DECL(vdup_n_expected2,poly,64,2) [] = { 0xfffffffffffffff2,
++ 0xfffffffffffffff2 };
++
++/* Expected results: vext. */
++VECT_VAR_DECL(vext_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vext_expected,poly,64,2) [] = { 0xfffffffffffffff1, 0x88 };
++
++/* Expected results: vget_low. */
++VECT_VAR_DECL(vget_low_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
++
++/* Expected results: vld1. */
++VECT_VAR_DECL(vld1_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vld1_expected,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++
++/* Expected results: vld1_dup. */
++VECT_VAR_DECL(vld1_dup_expected0,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vld1_dup_expected0,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff0 };
++VECT_VAR_DECL(vld1_dup_expected1,poly,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(vld1_dup_expected1,poly,64,2) [] = { 0xfffffffffffffff1,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vld1_dup_expected2,poly,64,1) [] = { 0xfffffffffffffff2 };
++VECT_VAR_DECL(vld1_dup_expected2,poly,64,2) [] = { 0xfffffffffffffff2,
++ 0xfffffffffffffff2 };
++
++/* Expected results: vld1_lane. */
++VECT_VAR_DECL(vld1_lane_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vld1_lane_expected,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xaaaaaaaaaaaaaaaa };
++
++/* Expected results: vldX. */
++VECT_VAR_DECL(vld2_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vld2_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(vld3_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vld3_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(vld3_expected_2,poly,64,1) [] = { 0xfffffffffffffff2 };
++VECT_VAR_DECL(vld4_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vld4_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(vld4_expected_2,poly,64,1) [] = { 0xfffffffffffffff2 };
++VECT_VAR_DECL(vld4_expected_3,poly,64,1) [] = { 0xfffffffffffffff3 };
++
++/* Expected results: vldX_dup. */
++VECT_VAR_DECL(vld2_dup_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vld2_dup_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(vld3_dup_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vld3_dup_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(vld3_dup_expected_2,poly,64,1) [] = { 0xfffffffffffffff2 };
++VECT_VAR_DECL(vld4_dup_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vld4_dup_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(vld4_dup_expected_2,poly,64,1) [] = { 0xfffffffffffffff2 };
++VECT_VAR_DECL(vld4_dup_expected_3,poly,64,1) [] = { 0xfffffffffffffff3 };
++
++/* Expected results: vsli. */
++VECT_VAR_DECL(vsli_expected,poly,64,1) [] = { 0x10 };
++VECT_VAR_DECL(vsli_expected,poly,64,2) [] = { 0x7ffffffffffff0,
++ 0x7ffffffffffff1 };
++VECT_VAR_DECL(vsli_expected_max_shift,poly,64,1) [] = { 0x7ffffffffffffff0 };
++VECT_VAR_DECL(vsli_expected_max_shift,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++
++/* Expected results: vsri. */
++VECT_VAR_DECL(vsri_expected,poly,64,1) [] = { 0xe000000000000000 };
++VECT_VAR_DECL(vsri_expected,poly,64,2) [] = { 0xfffffffffffff800,
++ 0xfffffffffffff800 };
++VECT_VAR_DECL(vsri_expected_max_shift,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vsri_expected_max_shift,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++
++/* Expected results: vst1_lane. */
++VECT_VAR_DECL(vst1_lane_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vst1_lane_expected,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0x3333333333333333 };
++
++int main (void)
++{
++ int i;
++
++ /* vbsl_p64 tests. */
++#define TEST_MSG "VBSL/VBSLQ"
++
++#define TEST_VBSL(T3, Q, T1, T2, W, N) \
++ VECT_VAR(vbsl_vector_res, T1, W, N) = \
++ vbsl##Q##_##T2##W(VECT_VAR(vbsl_vector_first, T3, W, N), \
++ VECT_VAR(vbsl_vector, T1, W, N), \
++ VECT_VAR(vbsl_vector2, T1, W, N)); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vbsl_vector_res, T1, W, N))
++
++ DECL_VARIABLE(vbsl_vector, poly, 64, 1);
++ DECL_VARIABLE(vbsl_vector, poly, 64, 2);
++ DECL_VARIABLE(vbsl_vector2, poly, 64, 1);
++ DECL_VARIABLE(vbsl_vector2, poly, 64, 2);
++ DECL_VARIABLE(vbsl_vector_res, poly, 64, 1);
++ DECL_VARIABLE(vbsl_vector_res, poly, 64, 2);
++
++ DECL_VARIABLE(vbsl_vector_first, uint, 64, 1);
++ DECL_VARIABLE(vbsl_vector_first, uint, 64, 2);
++
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ VLOAD(vbsl_vector, buffer, , poly, p, 64, 1);
++ VLOAD(vbsl_vector, buffer, q, poly, p, 64, 2);
++
++ VDUP(vbsl_vector2, , poly, p, 64, 1, 0xFFFFFFF3);
++ VDUP(vbsl_vector2, q, poly, p, 64, 2, 0xFFFFFFF3);
++
++ VDUP(vbsl_vector_first, , uint, u, 64, 1, 0xFFFFFFF2);
++ VDUP(vbsl_vector_first, q, uint, u, 64, 2, 0xFFFFFFF2);
++
++ TEST_VBSL(uint, , poly, p, 64, 1);
++ TEST_VBSL(uint, q, poly, p, 64, 2);
++
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vbsl_expected, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vbsl_expected, "");
++
++ /* vceq_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VCEQ"
++
++#define TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N) \
++ VECT_VAR(vceq_vector_res, T3, W, N) = \
++ INSN##Q##_##T2##W(VECT_VAR(vceq_vector, T1, W, N), \
++ VECT_VAR(vceq_vector2, T1, W, N)); \
++ vst1##Q##_u##W(VECT_VAR(result, T3, W, N), VECT_VAR(vceq_vector_res, T3, W, N))
++
++#define TEST_VCOMP(INSN, Q, T1, T2, T3, W, N) \
++ TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N)
++
++ DECL_VARIABLE(vceq_vector, poly, 64, 1);
++ DECL_VARIABLE(vceq_vector2, poly, 64, 1);
++ DECL_VARIABLE(vceq_vector_res, uint, 64, 1);
++
++ CLEAN(result, uint, 64, 1);
++
++ VLOAD(vceq_vector, buffer, , poly, p, 64, 1);
++
++ VDUP(vceq_vector2, , poly, p, 64, 1, 0x88);
++
++ TEST_VCOMP(vceq, , poly, p, uint, 64, 1);
++
++ CHECK(TEST_MSG, uint, 64, 1, PRIx64, vceq_expected, "");
++
++ /* vcombine_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VCOMBINE"
++
++#define TEST_VCOMBINE(T1, T2, W, N, N2) \
++ VECT_VAR(vcombine_vector128, T1, W, N2) = \
++ vcombine_##T2##W(VECT_VAR(vcombine_vector64_a, T1, W, N), \
++ VECT_VAR(vcombine_vector64_b, T1, W, N)); \
++ vst1q_##T2##W(VECT_VAR(result, T1, W, N2), VECT_VAR(vcombine_vector128, T1, W, N2))
++
++ DECL_VARIABLE(vcombine_vector64_a, poly, 64, 1);
++ DECL_VARIABLE(vcombine_vector64_b, poly, 64, 1);
++ DECL_VARIABLE(vcombine_vector128, poly, 64, 2);
++
++ CLEAN(result, poly, 64, 2);
++
++ VLOAD(vcombine_vector64_a, buffer, , poly, p, 64, 1);
++
++ VDUP(vcombine_vector64_b, , poly, p, 64, 1, 0x88);
++
++ TEST_VCOMBINE(poly, p, 64, 1, 2);
++
++ CHECK(TEST_MSG, poly, 64, 2, PRIx16, vcombine_expected, "");
++
++ /* vcreate_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VCREATE"
++
++#define TEST_VCREATE(T1, T2, W, N) \
++ VECT_VAR(vcreate_vector_res, T1, W, N) = \
++ vcreate_##T2##W(VECT_VAR(vcreate_val, T1, W, N)); \
++ vst1_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vcreate_vector_res, T1, W, N))
++
++#define DECL_VAL(VAR, T1, W, N) \
++ uint64_t VECT_VAR(VAR, T1, W, N)
++
++ DECL_VAL(vcreate_val, poly, 64, 1);
++ DECL_VARIABLE(vcreate_vector_res, poly, 64, 1);
++
++ CLEAN(result, poly, 64, 2);
++
++ VECT_VAR(vcreate_val, poly, 64, 1) = 0x123456789abcdef0ULL;
++
++ TEST_VCREATE(poly, p, 64, 1);
++
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vcreate_expected, "");
++
++ /* vdup_lane_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VDUP_LANE/VDUP_LANEQ"
++
++#define TEST_VDUP_LANE(Q, T1, T2, W, N, N2, L) \
++ VECT_VAR(vdup_lane_vector_res, T1, W, N) = \
++ vdup##Q##_lane_##T2##W(VECT_VAR(vdup_lane_vector, T1, W, N2), L); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vdup_lane_vector_res, T1, W, N))
++
++ DECL_VARIABLE(vdup_lane_vector, poly, 64, 1);
++ DECL_VARIABLE(vdup_lane_vector, poly, 64, 2);
++ DECL_VARIABLE(vdup_lane_vector_res, poly, 64, 1);
++ DECL_VARIABLE(vdup_lane_vector_res, poly, 64, 2);
++
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ VLOAD(vdup_lane_vector, buffer, , poly, p, 64, 1);
++
++ TEST_VDUP_LANE(, poly, p, 64, 1, 1, 0);
++ TEST_VDUP_LANE(q, poly, p, 64, 2, 1, 0);
++
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vdup_lane_expected, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vdup_lane_expected, "");
++
++ /* vdup_n_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VDUP/VDUPQ"
++
++#define TEST_VDUP(Q, T1, T2, W, N) \
++ VECT_VAR(vdup_n_vector, T1, W, N) = \
++ vdup##Q##_n_##T2##W(VECT_VAR(buffer_dup, T1, W, N)[i]); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vdup_n_vector, T1, W, N))
++
++ DECL_VARIABLE(vdup_n_vector, poly, 64, 1);
++ DECL_VARIABLE(vdup_n_vector, poly, 64, 2);
++
++ /* Try to read different places from the input buffer. */
++ for (i=0; i< 3; i++) {
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ TEST_VDUP(, poly, p, 64, 1);
++ TEST_VDUP(q, poly, p, 64, 2);
++
++ switch (i) {
++ case 0:
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vdup_n_expected0, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vdup_n_expected0, "");
++ break;
++ case 1:
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vdup_n_expected1, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vdup_n_expected1, "");
++ break;
++ case 2:
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vdup_n_expected2, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vdup_n_expected2, "");
++ break;
++ default:
++ abort();
++ }
++ }
++
++ /* vexit_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VEXT/VEXTQ"
++
++#define TEST_VEXT(Q, T1, T2, W, N, V) \
++ VECT_VAR(vext_vector_res, T1, W, N) = \
++ vext##Q##_##T2##W(VECT_VAR(vext_vector1, T1, W, N), \
++ VECT_VAR(vext_vector2, T1, W, N), \
++ V); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vext_vector_res, T1, W, N))
++
++ DECL_VARIABLE(vext_vector1, poly, 64, 1);
++ DECL_VARIABLE(vext_vector1, poly, 64, 2);
++ DECL_VARIABLE(vext_vector2, poly, 64, 1);
++ DECL_VARIABLE(vext_vector2, poly, 64, 2);
++ DECL_VARIABLE(vext_vector_res, poly, 64, 1);
++ DECL_VARIABLE(vext_vector_res, poly, 64, 2);
++
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ VLOAD(vext_vector1, buffer, , poly, p, 64, 1);
++ VLOAD(vext_vector1, buffer, q, poly, p, 64, 2);
++
++ VDUP(vext_vector2, , poly, p, 64, 1, 0x88);
++ VDUP(vext_vector2, q, poly, p, 64, 2, 0x88);
++
++ TEST_VEXT(, poly, p, 64, 1, 0);
++ TEST_VEXT(q, poly, p, 64, 2, 1);
++
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vext_expected, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vext_expected, "");
++
++ /* vget_low_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VGET_LOW"
++
++#define TEST_VGET_LOW(T1, T2, W, N, N2) \
++ VECT_VAR(vget_low_vector64, T1, W, N) = \
++ vget_low_##T2##W(VECT_VAR(vget_low_vector128, T1, W, N2)); \
++ vst1_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vget_low_vector64, T1, W, N))
++
++ DECL_VARIABLE(vget_low_vector64, poly, 64, 1);
++ DECL_VARIABLE(vget_low_vector128, poly, 64, 2);
++
++ CLEAN(result, poly, 64, 1);
++
++ VLOAD(vget_low_vector128, buffer, q, poly, p, 64, 2);
++
++ TEST_VGET_LOW(poly, p, 64, 1, 2);
++
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vget_low_expected, "");
++
++ /* vld1_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VLD1/VLD1Q"
++
++#define TEST_VLD1(VAR, BUF, Q, T1, T2, W, N) \
++ VECT_VAR(VAR, T1, W, N) = vld1##Q##_##T2##W(VECT_VAR(BUF, T1, W, N)); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(VAR, T1, W, N))
++
++ DECL_VARIABLE(vld1_vector, poly, 64, 1);
++ DECL_VARIABLE(vld1_vector, poly, 64, 2);
++
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ VLOAD(vld1_vector, buffer, , poly, p, 64, 1);
++ VLOAD(vld1_vector, buffer, q, poly, p, 64, 2);
++
++ TEST_VLD1(vld1_vector, buffer, , poly, p, 64, 1);
++ TEST_VLD1(vld1_vector, buffer, q, poly, p, 64, 2);
++
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_expected, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_expected, "");
++
++ /* vld1_dup_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VLD1_DUP/VLD1_DUPQ"
++
++#define TEST_VLD1_DUP(VAR, BUF, Q, T1, T2, W, N) \
++ VECT_VAR(VAR, T1, W, N) = \
++ vld1##Q##_dup_##T2##W(&VECT_VAR(BUF, T1, W, N)[i]); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(VAR, T1, W, N))
++
++ DECL_VARIABLE(vld1_dup_vector, poly, 64, 1);
++ DECL_VARIABLE(vld1_dup_vector, poly, 64, 2);
++
++ /* Try to read different places from the input buffer. */
++ for (i=0; i<3; i++) {
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ TEST_VLD1_DUP(vld1_dup_vector, buffer_dup, , poly, p, 64, 1);
++ TEST_VLD1_DUP(vld1_dup_vector, buffer_dup, q, poly, p, 64, 2);
++
++ switch (i) {
++ case 0:
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_dup_expected0, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_dup_expected0, "");
++ break;
++ case 1:
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_dup_expected1, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_dup_expected1, "");
++ break;
++ case 2:
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_dup_expected2, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_dup_expected2, "");
++ break;
++ default:
++ abort();
++ }
++ }
++
++ /* vld1_lane_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VLD1_LANE/VLD1_LANEQ"
++
++#define TEST_VLD1_LANE(Q, T1, T2, W, N, L) \
++ memset (VECT_VAR(vld1_lane_buffer_src, T1, W, N), 0xAA, W/8*N); \
++ VECT_VAR(vld1_lane_vector_src, T1, W, N) = \
++ vld1##Q##_##T2##W(VECT_VAR(vld1_lane_buffer_src, T1, W, N)); \
++ VECT_VAR(vld1_lane_vector, T1, W, N) = \
++ vld1##Q##_lane_##T2##W(VECT_VAR(buffer, T1, W, N), \
++ VECT_VAR(vld1_lane_vector_src, T1, W, N), L); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vld1_lane_vector, T1, W, N))
++
++ DECL_VARIABLE(vld1_lane_vector, poly, 64, 1);
++ DECL_VARIABLE(vld1_lane_vector, poly, 64, 2);
++ DECL_VARIABLE(vld1_lane_vector_src, poly, 64, 1);
++ DECL_VARIABLE(vld1_lane_vector_src, poly, 64, 2);
++
++ ARRAY(vld1_lane_buffer_src, poly, 64, 1);
++ ARRAY(vld1_lane_buffer_src, poly, 64, 2);
++
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ TEST_VLD1_LANE(, poly, p, 64, 1, 0);
++ TEST_VLD1_LANE(q, poly, p, 64, 2, 0);
++
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_lane_expected, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_lane_expected, "");
++
++ /* vldX_p64 tests. */
++#define DECL_VLDX(T1, W, N, X) \
++ VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vldX_vector, T1, W, N, X); \
++ VECT_VAR_DECL(vldX_result_bis_##X, T1, W, N)[X * N]
++
++#define TEST_VLDX(Q, T1, T2, W, N, X) \
++ VECT_ARRAY_VAR(vldX_vector, T1, W, N, X) = \
++ /* Use dedicated init buffer, of size X */ \
++ vld##X##Q##_##T2##W(VECT_ARRAY_VAR(buffer_vld##X, T1, W, N, X)); \
++ vst##X##Q##_##T2##W(VECT_VAR(vldX_result_bis_##X, T1, W, N), \
++ VECT_ARRAY_VAR(vldX_vector, T1, W, N, X)); \
++ memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(vldX_result_bis_##X, T1, W, N), \
++ sizeof(VECT_VAR(result, T1, W, N)));
++
++ /* Overwrite "result" with the contents of "result_bis"[Y]. */
++#define TEST_EXTRA_CHUNK(T1, W, N, X,Y) \
++ memcpy(VECT_VAR(result, T1, W, N), \
++ &(VECT_VAR(vldX_result_bis_##X, T1, W, N)[Y*N]), \
++ sizeof(VECT_VAR(result, T1, W, N)));
++
++ DECL_VLDX(poly, 64, 1, 2);
++ DECL_VLDX(poly, 64, 1, 3);
++ DECL_VLDX(poly, 64, 1, 4);
++
++ VECT_ARRAY_INIT2(buffer_vld2, poly, 64, 1);
++ PAD(buffer_vld2_pad, poly, 64, 1);
++ VECT_ARRAY_INIT3(buffer_vld3, poly, 64, 1);
++ PAD(buffer_vld3_pad, poly, 64, 1);
++ VECT_ARRAY_INIT4(buffer_vld4, poly, 64, 1);
++ PAD(buffer_vld4_pad, poly, 64, 1);
++
++#undef TEST_MSG
++#define TEST_MSG "VLD2/VLD2Q"
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX(, poly, p, 64, 1, 2);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld2_expected_0, "chunk 0");
++ CLEAN(result, poly, 64, 1);
++ TEST_EXTRA_CHUNK(poly, 64, 1, 2, 1);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld2_expected_1, "chunk 1");
++
++#undef TEST_MSG
++#define TEST_MSG "VLD3/VLD3Q"
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX(, poly, p, 64, 1, 3);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_expected_0, "chunk 0");
++ CLEAN(result, poly, 64, 1);
++ TEST_EXTRA_CHUNK(poly, 64, 1, 3, 1);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_expected_1, "chunk 1");
++ CLEAN(result, poly, 64, 1);
++ TEST_EXTRA_CHUNK(poly, 64, 1, 3, 2);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_expected_2, "chunk 2");
++
++#undef TEST_MSG
++#define TEST_MSG "VLD4/VLD4Q"
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX(, poly, p, 64, 1, 4);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_expected_0, "chunk 0");
++ CLEAN(result, poly, 64, 1);
++ TEST_EXTRA_CHUNK(poly, 64, 1, 4, 1);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_expected_1, "chunk 1");
++ CLEAN(result, poly, 64, 1);
++ TEST_EXTRA_CHUNK(poly, 64, 1, 4, 2);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_expected_2, "chunk 2");
++ CLEAN(result, poly, 64, 1);
++ TEST_EXTRA_CHUNK(poly, 64, 1, 4, 3);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_expected_3, "chunk 3");
++
++ /* vldX_dup_p64 tests. */
++#define DECL_VLDX_DUP(T1, W, N, X) \
++ VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vldX_dup_vector, T1, W, N, X); \
++ VECT_VAR_DECL(vldX_dup_result_bis_##X, T1, W, N)[X * N]
++
++#define TEST_VLDX_DUP(Q, T1, T2, W, N, X) \
++ VECT_ARRAY_VAR(vldX_dup_vector, T1, W, N, X) = \
++ vld##X##Q##_dup_##T2##W(&VECT_VAR(buffer_dup, T1, W, N)[0]); \
++ \
++ vst##X##Q##_##T2##W(VECT_VAR(vldX_dup_result_bis_##X, T1, W, N), \
++ VECT_ARRAY_VAR(vldX_dup_vector, T1, W, N, X)); \
++ memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(vldX_dup_result_bis_##X, T1, W, N), \
++ sizeof(VECT_VAR(result, T1, W, N)));
++
++ /* Overwrite "result" with the contents of "result_bis"[Y]. */
++#define TEST_VLDX_DUP_EXTRA_CHUNK(T1, W, N, X,Y) \
++ memcpy(VECT_VAR(result, T1, W, N), \
++ &(VECT_VAR(vldX_dup_result_bis_##X, T1, W, N)[Y*N]), \
++ sizeof(VECT_VAR(result, T1, W, N)));
++
++ DECL_VLDX_DUP(poly, 64, 1, 2);
++ DECL_VLDX_DUP(poly, 64, 1, 3);
++ DECL_VLDX_DUP(poly, 64, 1, 4);
++
++
++#undef TEST_MSG
++#define TEST_MSG "VLD2_DUP/VLD2Q_DUP"
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX_DUP(, poly, p, 64, 1, 2);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld2_dup_expected_0, "chunk 0");
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 2, 1);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld2_dup_expected_1, "chunk 1");
++
++#undef TEST_MSG
++#define TEST_MSG "VLD3_DUP/VLD3Q_DUP"
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX_DUP(, poly, p, 64, 1, 3);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_dup_expected_0, "chunk 0");
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 3, 1);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_dup_expected_1, "chunk 1");
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 3, 2);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_dup_expected_2, "chunk 2");
++
++#undef TEST_MSG
++#define TEST_MSG "VLD4_DUP/VLD4Q_DUP"
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX_DUP(, poly, p, 64, 1, 4);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_dup_expected_0, "chunk 0");
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 4, 1);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_dup_expected_1, "chunk 1");
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 4, 2);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_dup_expected_2, "chunk 2");
++ CLEAN(result, poly, 64, 1);
++ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 4, 3);
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_dup_expected_3, "chunk 3");
++
++ /* vsli_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VSLI"
++
++#define TEST_VSXI1(INSN, Q, T1, T2, W, N, V) \
++ VECT_VAR(vsXi_vector_res, T1, W, N) = \
++ INSN##Q##_n_##T2##W(VECT_VAR(vsXi_vector, T1, W, N), \
++ VECT_VAR(vsXi_vector2, T1, W, N), \
++ V); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vsXi_vector_res, T1, W, N))
++
++#define TEST_VSXI(INSN, Q, T1, T2, W, N, V) \
++ TEST_VSXI1(INSN, Q, T1, T2, W, N, V)
++
++ DECL_VARIABLE(vsXi_vector, poly, 64, 1);
++ DECL_VARIABLE(vsXi_vector, poly, 64, 2);
++ DECL_VARIABLE(vsXi_vector2, poly, 64, 1);
++ DECL_VARIABLE(vsXi_vector2, poly, 64, 2);
++ DECL_VARIABLE(vsXi_vector_res, poly, 64, 1);
++ DECL_VARIABLE(vsXi_vector_res, poly, 64, 2);
++
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ VLOAD(vsXi_vector, buffer, , poly, p, 64, 1);
++ VLOAD(vsXi_vector, buffer, q, poly, p, 64, 2);
++
++ VDUP(vsXi_vector2, , poly, p, 64, 1, 2);
++ VDUP(vsXi_vector2, q, poly, p, 64, 2, 3);
++
++ TEST_VSXI(vsli, , poly, p, 64, 1, 3);
++ TEST_VSXI(vsli, q, poly, p, 64, 2, 53);
++
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vsli_expected, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vsli_expected, "");
++
++ /* Test cases with maximum shift amount. */
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ TEST_VSXI(vsli, , poly, p, 64, 1, 63);
++ TEST_VSXI(vsli, q, poly, p, 64, 2, 63);
++
++#define COMMENT "(max shift amount)"
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vsli_expected_max_shift, COMMENT);
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vsli_expected_max_shift, COMMENT);
++
++ /* vsri_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VSRI"
++
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ VLOAD(vsXi_vector, buffer, , poly, p, 64, 1);
++ VLOAD(vsXi_vector, buffer, q, poly, p, 64, 2);
++
++ VDUP(vsXi_vector2, , poly, p, 64, 1, 2);
++ VDUP(vsXi_vector2, q, poly, p, 64, 2, 3);
++
++ TEST_VSXI(vsri, , poly, p, 64, 1, 3);
++ TEST_VSXI(vsri, q, poly, p, 64, 2, 53);
++
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vsri_expected, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vsri_expected, "");
++
++ /* Test cases with maximum shift amount. */
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ TEST_VSXI(vsri, , poly, p, 64, 1, 64);
++ TEST_VSXI(vsri, q, poly, p, 64, 2, 64);
++
++#define COMMENT "(max shift amount)"
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vsri_expected_max_shift, COMMENT);
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vsri_expected_max_shift, COMMENT);
++
++ /* vst1_lane_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VST1_LANE/VST1_LANEQ"
++
++#define TEST_VST1_LANE(Q, T1, T2, W, N, L) \
++ VECT_VAR(vst1_lane_vector, T1, W, N) = \
++ vld1##Q##_##T2##W(VECT_VAR(buffer, T1, W, N)); \
++ vst1##Q##_lane_##T2##W(VECT_VAR(result, T1, W, N), \
++ VECT_VAR(vst1_lane_vector, T1, W, N), L)
++
++ DECL_VARIABLE(vst1_lane_vector, poly, 64, 1);
++ DECL_VARIABLE(vst1_lane_vector, poly, 64, 2);
++
++ CLEAN(result, poly, 64, 1);
++ CLEAN(result, poly, 64, 2);
++
++ TEST_VST1_LANE(, poly, p, 64, 1, 0);
++ TEST_VST1_LANE(q, poly, p, 64, 2, 0);
++
++ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vst1_lane_expected, "");
++ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vst1_lane_expected, "");
++
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/ternary_scalar_op.inc
+@@ -0,0 +1,206 @@
++/* Template file for ternary scalar operator validation.
++
++ This file is meant to be included by test files for binary scalar
++ operations. */
++
++/* Check for required settings. */
++
++#ifndef INSN_NAME
++#error INSN_NAME (the intrinsic to test) must be defined.
++#endif
++
++#ifndef INPUT_TYPE
++#error INPUT_TYPE (basic type of an input value) must be defined.
++#endif
++
++#ifndef OUTPUT_TYPE
++#error OUTPUT_TYPE (basic type of an output value) must be defined.
++#endif
++
++#ifndef OUTPUT_TYPE_SIZE
++#error OUTPUT_TYPE_SIZE (size in bits of an output value) must be defined.
++#endif
++
++/* Optional settings:
++
++ INPUT_1: Input values for the first parameter. Must be of type INPUT_TYPE.
++ INPUT_2: Input values for the second parameter. Must be of type INPUT_TYPE.
++ INPUT_3: Input values for the third parameter. Must be of type
++ INPUT_TYPE. */
++
++#ifndef TEST_MSG
++#define TEST_MSG "unnamed test"
++#endif
++
++/* The test framework. */
++
++#include <stdio.h>
++
++extern void abort ();
++
++#define INFF __builtin_inf ()
++
++/* Stringify a macro. */
++#define STR0(A) #A
++#define STR(A) STR0 (A)
++
++/* Macro concatenation. */
++#define CAT0(A, B) A##B
++#define CAT(A, B) CAT0 (A, B)
++
++/* Format strings for error reporting. */
++#define FMT16 "0x%04x"
++#define FMT32 "0x%08x"
++#define FMT CAT (FMT,OUTPUT_TYPE_SIZE)
++
++/* Type construction: forms TS_t, where T is the base type and S the size in
++ bits. */
++#define MK_TYPE0(T, S) T##S##_t
++#define MK_TYPE(T, S) MK_TYPE0 (T, S)
++
++/* Convenience types for input and output data. */
++typedef MK_TYPE (uint, OUTPUT_TYPE_SIZE) output_hex_type;
++
++/* Conversion between typed values and their hexadecimal representation. */
++typedef union
++{
++ OUTPUT_TYPE value;
++ output_hex_type hex;
++} output_conv_type;
++
++/* Default input values. */
++
++float16_t input_1_float16_t[] =
++{
++ 0.0,
++ -0.0,
++ 2.0,
++ 3.1,
++ 20.0,
++ 0.40,
++ -2.3,
++ 1.33,
++ -7.6,
++ 0.31,
++ 0.3353,
++ 0.5,
++ 1.0,
++ 13.13,
++ -6.3,
++ 20.0,
++ (float16_t)INFF,
++ (float16_t)-INFF,
++};
++
++float16_t input_2_float16_t[] =
++{
++ 1.0,
++ 1.0,
++ -4.33,
++ 100.0,
++ 30.0,
++ -0.02,
++ 0.5,
++ -7.231,
++ -6.3,
++ 20.0,
++ -7.231,
++ 2.3,
++ -7.6,
++ 5.1,
++ 0.31,
++ 0.33353,
++ (float16_t)-INFF,
++ (float16_t)INFF,
++};
++
++float16_t input_3_float16_t[] =
++{
++ -0.0,
++ 0.0,
++ 0.31,
++ -0.31,
++ 1.31,
++ 2.1,
++ -6.3,
++ 1.0,
++ -1.5,
++ 5.1,
++ 0.3353,
++ 9.3,
++ -9.3,
++ -7.231,
++ 0.5,
++ -0.33,
++ (float16_t)INFF,
++ (float16_t)INFF,
++};
++
++#ifndef INPUT_1
++#define INPUT_1 CAT (input_1_,INPUT_TYPE)
++#endif
++
++#ifndef INPUT_2
++#define INPUT_2 CAT (input_2_,INPUT_TYPE)
++#endif
++
++#ifndef INPUT_3
++#define INPUT_3 CAT (input_3_,INPUT_TYPE)
++#endif
++
++/* Support macros and routines for the test function. */
++
++#define CHECK() \
++ { \
++ output_conv_type actual; \
++ output_conv_type expect; \
++ \
++ expect.hex = ((output_hex_type*)EXPECTED)[index]; \
++ actual.value = INSN_NAME ((INPUT_1)[index], \
++ (INPUT_2)[index], \
++ (INPUT_3)[index]); \
++ \
++ if (actual.hex != expect.hex) \
++ { \
++ fprintf (stderr, \
++ "ERROR in %s (%s line %d), buffer %s, " \
++ "index %d: got " \
++ FMT " != " FMT "\n", \
++ TEST_MSG, __FILE__, __LINE__, \
++ STR (EXPECTED), index, \
++ actual.hex, expect.hex); \
++ abort (); \
++ } \
++ fprintf (stderr, "CHECKED %s %s\n", \
++ STR (EXPECTED), TEST_MSG); \
++ }
++
++#define FNNAME1(NAME) exec_ ## NAME
++#define FNNAME(NAME) FNNAME1 (NAME)
++
++/* The test function. */
++
++void
++FNNAME (INSN_NAME) (void)
++{
++ /* Basic test: y[i] = OP (x[i]), for each INPUT[i], then compare the result
++ against EXPECTED[i]. */
++
++ const int num_tests = sizeof (INPUT_1) / sizeof (INPUT_1[0]);
++ int index;
++
++ for (index = 0; index < num_tests; index++)
++ CHECK ();
++
++#ifdef EXTRA_TESTS
++ EXTRA_TESTS ();
++#endif
++}
++
++int
++main (void)
++{
++ FNNAME (INSN_NAME) ();
++
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc
+@@ -0,0 +1,200 @@
++/* Template file for unary scalar operator validation.
++
++ This file is meant to be included by test files for unary scalar
++ operations. */
++
++/* Check for required settings. */
++
++#ifndef INSN_NAME
++#error INSN_NAME (the intrinsic to test) must be defined.
++#endif
++
++#ifndef INPUT_TYPE
++#error INPUT_TYPE (basic type of an input value) must be defined.
++#endif
++
++#ifndef SCALAR_OPERANDS
++#ifndef EXPECTED
++#error EXPECTED (an array of expected output values) must be defined.
++#endif
++#endif
++
++#ifndef OUTPUT_TYPE
++#error OUTPUT_TYPE (basic type of an output value) must be defined.
++#endif
++
++#ifndef OUTPUT_TYPE_SIZE
++#error OUTPUT_TYPE_SIZE (size in bits of an output value) must be defined.
++#endif
++
++/* Optional settings. */
++
++/* SCALAR_OPERANDS: Defined iff the intrinsic has a scalar operand.
++
++ SCALAR_1, SCALAR_2, .., SCALAR_4: If SCALAR_OPERANDS is defined, SCALAR_<n>
++ is the scalar and EXPECTED_<n> is array of expected values.
++
++ INPUT: Input values for the first parameter. Must be of type INPUT_TYPE. */
++
++/* Additional comments for the error message. */
++#ifndef COMMENT
++#define COMMENT ""
++#endif
++
++#ifndef TEST_MSG
++#define TEST_MSG "unnamed test"
++#endif
++
++/* The test framework. */
++
++#include <stdio.h>
++
++extern void abort ();
++
++#define INFF __builtin_inf ()
++
++/* Stringify a macro. */
++#define STR0(A) #A
++#define STR(A) STR0 (A)
++
++/* Macro concatenation. */
++#define CAT0(A, B) A##B
++#define CAT(A, B) CAT0 (A, B)
++
++/* Format strings for error reporting. */
++#define FMT16 "0x%04x"
++#define FMT32 "0x%08x"
++#define FMT64 "0x%016x"
++#define FMT CAT (FMT,OUTPUT_TYPE_SIZE)
++
++/* Type construction: forms TS_t, where T is the base type and S the size in
++ bits. */
++#define MK_TYPE0(T, S) T##S##_t
++#define MK_TYPE(T, S) MK_TYPE0 (T, S)
++
++/* Convenience types for input and output data. */
++typedef MK_TYPE (uint, OUTPUT_TYPE_SIZE) output_hex_type;
++
++/* Conversion between typed values and their hexadecimal representation. */
++typedef union
++{
++ OUTPUT_TYPE value;
++ output_hex_type hex;
++} output_conv_type;
++
++/* Default input values. */
++
++float16_t input_1_float16_t[] =
++{
++ 0.0, -0.0,
++ 2.0, 3.1,
++ 20.0, 0.40,
++ -2.3, 1.33,
++ -7.6, 0.31,
++ 0.3353, 0.5,
++ 1.0, 13.13,
++ -6.3, 20.0,
++ (float16_t)INFF, (float16_t)-INFF,
++};
++
++#ifndef INPUT
++#define INPUT CAT(input_1_,INPUT_TYPE)
++#endif
++
++/* Support macros and routines for the test function. */
++
++#define CHECK() \
++ { \
++ output_conv_type actual; \
++ output_conv_type expect; \
++ \
++ expect.hex = ((output_hex_type*)EXPECTED)[index]; \
++ actual.value = INSN_NAME ((INPUT)[index]); \
++ \
++ if (actual.hex != expect.hex) \
++ { \
++ fprintf (stderr, \
++ "ERROR in %s (%s line %d), buffer %s, " \
++ "index %d: got " \
++ FMT " != " FMT "\n", \
++ TEST_MSG, __FILE__, __LINE__, \
++ STR (EXPECTED), index, \
++ actual.hex, expect.hex); \
++ abort (); \
++ } \
++ fprintf (stderr, "CHECKED %s %s\n", \
++ STR (EXPECTED), TEST_MSG); \
++ }
++
++#define CHECK_N(SCALAR, EXPECTED) \
++ { \
++ output_conv_type actual; \
++ output_conv_type expect; \
++ \
++ expect.hex \
++ = ((output_hex_type*)EXPECTED)[index]; \
++ actual.value = INSN_NAME ((INPUT)[index], (SCALAR)); \
++ \
++ if (actual.hex != expect.hex) \
++ { \
++ fprintf (stderr, \
++ "ERROR in %s (%s line %d), buffer %s, " \
++ "index %d: got " \
++ FMT " != " FMT "\n", \
++ TEST_MSG, __FILE__, __LINE__, \
++ STR (EXPECTED), index, \
++ actual.hex, expect.hex); \
++ abort (); \
++ } \
++ fprintf (stderr, "CHECKED %s %s\n", \
++ STR (EXPECTED), TEST_MSG); \
++ }
++
++#define FNNAME1(NAME) exec_ ## NAME
++#define FNNAME(NAME) FNNAME1 (NAME)
++
++/* The test function. */
++
++void
++FNNAME (INSN_NAME) (void)
++{
++ /* Basic test: y[i] = OP (x[i]), for each INPUT[i], then compare the result
++ against EXPECTED[i]. */
++
++ const int num_tests = sizeof (INPUT) / sizeof (INPUT[0]);
++ int index;
++
++ for (index = 0; index < num_tests; index++)
++ {
++#if defined (SCALAR_OPERANDS)
++
++#ifdef SCALAR_1
++ CHECK_N (SCALAR_1, EXPECTED_1);
++#endif
++#ifdef SCALAR_2
++ CHECK_N (SCALAR_2, EXPECTED_2);
++#endif
++#ifdef SCALAR_3
++ CHECK_N (SCALAR_3, EXPECTED_3);
++#endif
++#ifdef SCALAR_4
++ CHECK_N (SCALAR_4, EXPECTED_4);
++#endif
++
++#else /* !defined (SCALAR_OPERAND). */
++ CHECK ();
++#endif
++ }
++
++#ifdef EXTRA_TESTS
++ EXTRA_TESTS ();
++#endif
++}
++
++int
++main (void)
++{
++ FNNAME (INSN_NAME) ();
++
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabd.c
+@@ -30,10 +30,20 @@ VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffffffd0, 0xffffffd1,
+ 0xffffffd2, 0xffffffd3 };
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x42407ae1, 0x423c7ae1,
+ 0x42387ae1, 0x42347ae1 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0x4e13, 0x4dd3,
++ 0x4d93, 0x4d53 };
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0x5204, 0x51e4, 0x51c4, 0x51a4,
++ 0x5184, 0x5164, 0x5144, 0x5124 };
++#endif
+
+ /* Additional expected results for float32 variants with specially
+ chosen input values. */
+ VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_float16, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
++#endif
+
+ #define TEST_MSG "VABD/VABDQ"
+ void exec_vabd (void)
+@@ -65,6 +75,17 @@ void exec_vabd (void)
+ DECL_VABD_VAR(vector2);
+ DECL_VABD_VAR(vector_res);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector1, float, 16, 4);
++ DECL_VARIABLE(vector1, float, 16, 8);
++
++ DECL_VARIABLE(vector2, float, 16, 4);
++ DECL_VARIABLE(vector2, float, 16, 8);
++
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
++
+ clean_results ();
+
+ /* Initialize input "vector1" from "buffer". */
+@@ -82,6 +103,12 @@ void exec_vabd (void)
+ VLOAD(vector1, buffer, q, uint, u, 16, 8);
+ VLOAD(vector1, buffer, q, uint, u, 32, 4);
+ VLOAD(vector1, buffer, q, float, f, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector1, buffer, , float, f, 16, 4);
++ VLOAD(vector1, buffer, , float, f, 16, 4);
++ VLOAD(vector1, buffer, q, float, f, 16, 8);
++ VLOAD(vector1, buffer, q, float, f, 16, 8);
++#endif
+
+ /* Choose init value arbitrarily. */
+ VDUP(vector2, , int, s, 8, 8, 1);
+@@ -98,6 +125,10 @@ void exec_vabd (void)
+ VDUP(vector2, q, uint, u, 16, 8, 12);
+ VDUP(vector2, q, uint, u, 32, 4, 32);
+ VDUP(vector2, q, float, f, 32, 4, 32.12f);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector2, , float, f, 16, 4, 8.3f);
++ VDUP(vector2, q, float, f, 16, 8, 32.12f);
++#endif
+
+ /* Execute the tests. */
+ TEST_VABD(, int, s, 8, 8);
+@@ -115,6 +146,11 @@ void exec_vabd (void)
+ TEST_VABD(q, uint, u, 32, 4);
+ TEST_VABD(q, float, f, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VABD(, float, f, 16, 4);
++ TEST_VABD(q, float, f, 16, 8);
++#endif
++
+ CHECK(TEST_MSG, int, 8, 8, PRIx8, expected, "");
+ CHECK(TEST_MSG, int, 16, 4, PRIx16, expected, "");
+ CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
+@@ -129,7 +165,10 @@ void exec_vabd (void)
+ CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected, "");
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
-
- __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
- vpadal_s8 (int16x4_t a, int8x8_t b)
- {
-@@ -8785,24 +8214,13 @@ vpadalq_u16 (uint32x4_t a, uint16x8_t b)
- return result;
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
++#endif
+
+ /* Extra FP tests with special values (-0.0, ....) */
+ VDUP(vector1, q, float, f, 32, 4, -0.0f);
+@@ -137,11 +176,27 @@ void exec_vabd (void)
+ TEST_VABD(q, float, f, 32, 4);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, " FP special (-0.0)");
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector1, q, float, f, 16, 8, -0.0f);
++ VDUP(vector2, q, float, f, 16, 8, 0.0);
++ TEST_VABD(q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16,
++ " FP special (-0.0)");
++#endif
++
+ /* Extra FP tests with special values (-0.0, ....) */
+ VDUP(vector1, q, float, f, 32, 4, 0.0f);
+ VDUP(vector2, q, float, f, 32, 4, -0.0);
+ TEST_VABD(q, float, f, 32, 4);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, " FP special (-0.0)");
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector1, q, float, f, 16, 8, 0.0f);
++ VDUP(vector2, q, float, f, 16, 8, -0.0);
++ TEST_VABD(q, float, f, 16, 8);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16,
++ " FP special (-0.0)");
++#endif
}
--__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
--vpadalq_u32 (uint64x2_t a, uint32x4_t b)
--{
-- uint64x2_t result;
-- __asm__ ("uadalp %0.2d,%2.4s"
-- : "=w"(result)
-- : "0"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
--vpadd_f32 (float32x2_t a, float32x2_t b)
-+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-+vpadalq_u32 (uint64x2_t a, uint32x4_t b)
+ int main (void)
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabdh_f16_1.c
+@@ -0,0 +1,44 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++#define INFF __builtin_inf ()
++
++/* Expected results.
++ Absolute difference between INPUT1 and INPUT2 in binary_scalar_op.inc. */
++uint16_t expected[] =
++{
++ 0x3C00,
++ 0x3C00,
++ 0x4654,
++ 0x560E,
++ 0x4900,
++ 0x36B8,
++ 0x419a,
++ 0x4848,
++ 0x3d34,
++ 0x4cec,
++ 0x4791,
++ 0x3f34,
++ 0x484d,
++ 0x4804,
++ 0x469c,
++ 0x4ceb,
++ 0x7c00,
++ 0x7c00
++};
++
++#define TEST_MSG "VABDH_F16"
++#define INSN_NAME vabdh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabs.c
+@@ -21,24 +21,52 @@ VECT_VAR_DECL(expected,int,32,4) [] = { 0x10, 0xf, 0xe, 0xd };
+ /* Expected results for float32 variants. Needs to be separated since
+ the generic test function does not test floating-point
+ versions. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_float16, hfloat, 16, 4) [] = { 0x409a, 0x409a,
++ 0x409a, 0x409a };
++VECT_VAR_DECL(expected_float16, hfloat, 16, 8) [] = { 0x42cd, 0x42cd,
++ 0x42cd, 0x42cd,
++ 0x42cd, 0x42cd,
++ 0x42cd, 0x42cd };
++#endif
+ VECT_VAR_DECL(expected_float32,hfloat,32,2) [] = { 0x40133333, 0x40133333 };
+ VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0x4059999a, 0x4059999a,
+ 0x4059999a, 0x4059999a };
+
+ void exec_vabs_f32(void)
{
-- float32x2_t result;
-- __asm__ ("faddp %0.2s,%1.2s,%2.2s"
-+ uint64x2_t result;
-+ __asm__ ("uadalp %0.2d,%2.4s"
- : "=w"(result)
-- : "w"(a), "w"(b)
-+ : "0"(a), "w"(b)
- : /* No clobbers */);
- return result;
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector, float, 32, 2);
+ DECL_VARIABLE(vector, float, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector_res, float, 32, 2);
+ DECL_VARIABLE(vector_res, float, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, -2.3f);
++ VDUP(vector, q, float, f, 16, 8, 3.4f);
++#endif
+ VDUP(vector, , float, f, 32, 2, -2.3f);
+ VDUP(vector, q, float, f, 32, 4, 3.4f);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_UNARY_OP(INSN_NAME, , float, f, 16, 4);
++ TEST_UNARY_OP(INSN_NAME, q, float, f, 16, 8);
++#endif
+ TEST_UNARY_OP(INSN_NAME, , float, f, 32, 2);
+ TEST_UNARY_OP(INSN_NAME, q, float, f, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_float16, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16, "");
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_float32, "");
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, "");
}
-@@ -8939,28 +8357,6 @@ vpaddlq_u32 (uint32x4_t a)
- return result;
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x4000 /* 2.000000 */,
++ 0x4233 /* 3.099609 */,
++ 0x4d00 /* 20.000000 */,
++ 0x3666 /* 0.399902 */,
++ 0x409a /* 2.300781 */,
++ 0x3d52 /* 1.330078 */,
++ 0x479a /* 7.601562 */,
++ 0x34f6 /* 0.310059 */,
++ 0x355d /* 0.335205 */,
++ 0x3800 /* 0.500000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x4a91 /* 13.132812 */,
++ 0x464d /* 6.300781 */,
++ 0x4d00 /* 20.000000 */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */
++};
++
++#define TEST_MSG "VABSH_F16"
++#define INSN_NAME vabsh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vadd.c
+@@ -43,6 +43,14 @@ VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff3,
+ VECT_VAR_DECL(expected_float32,hfloat,32,2) [] = { 0x40d9999a, 0x40d9999a };
+ VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0x41100000, 0x41100000,
+ 0x41100000, 0x41100000 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_float16, hfloat, 16, 4) [] = { 0x46cd, 0x46cd,
++ 0x46cd, 0x46cd };
++VECT_VAR_DECL(expected_float16, hfloat, 16, 8) [] = { 0x4880, 0x4880,
++ 0x4880, 0x4880,
++ 0x4880, 0x4880,
++ 0x4880, 0x4880 };
++#endif
+
+ void exec_vadd_f32(void)
+ {
+@@ -66,4 +74,27 @@ void exec_vadd_f32(void)
+
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_float32, "");
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, "");
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
++
++ DECL_VARIABLE(vector2, float, 16, 4);
++ DECL_VARIABLE(vector2, float, 16, 8);
++
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++
++ VDUP(vector, , float, f, 16, 4, 2.3f);
++ VDUP(vector, q, float, f, 16, 8, 3.4f);
++
++ VDUP(vector2, , float, f, 16, 4, 4.5f);
++ VDUP(vector2, q, float, f, 16, 8, 5.6f);
++
++ TEST_BINARY_OP(INSN_NAME, , float, f, 16, 4);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
++
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_float16, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16, "");
++#endif
}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0xc0a8 /* -2.328125 */,
++ 0x5672 /* 103.125000 */,
++ 0x5240 /* 50.000000 */,
++ 0x3614 /* 0.379883 */,
++ 0xbf34 /* -1.800781 */,
++ 0xc5e6 /* -5.898438 */,
++ 0xcaf4 /* -13.906250 */,
++ 0x4d14 /* 20.312500 */,
++ 0xc6e5 /* -6.894531 */,
++ 0x419a /* 2.800781 */,
++ 0xc69a /* -6.601562 */,
++ 0x4c8f /* 18.234375 */,
++ 0xc5fe /* -5.992188 */,
++ 0x4d15 /* 20.328125 */,
++ 0x7e00 /* nan */,
++ 0x7e00 /* nan */,
++};
++
++#define TEST_MSG "VADDH_F16"
++#define INSN_NAME vaddh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbsl.c
+@@ -16,6 +16,10 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffff1 };
+ VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
+ 0xf7, 0xf7, 0xf7, 0xf7 };
+ VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc09, 0xcb89,
++ 0xcb09, 0xca89 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800004, 0xc1700004 };
+ VECT_VAR_DECL(expected,int,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
+ 0xf6, 0xf6, 0xf6, 0xf6,
+@@ -43,6 +47,12 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
+ 0xf7, 0xf7, 0xf7, 0xf7 };
+ VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff2, 0xfff2,
+ 0xfff4, 0xfff4, 0xfff6, 0xfff6 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc09, 0xcb89,
++ 0xcb09, 0xca89,
++ 0xca09, 0xc989,
++ 0xc909, 0xc889 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800001, 0xc1700001,
+ 0xc1600001, 0xc1500001 };
+
+@@ -66,6 +76,10 @@ void exec_vbsl (void)
+ clean_results ();
+
+ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (FP16_SUPPORTED)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector, buffer, , float, f, 32, 2);
+ VLOAD(vector, buffer, q, float, f, 32, 4);
--__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
--vpaddq_f32 (float32x4_t a, float32x4_t b)
--{
-- float32x4_t result;
-- __asm__ ("faddp %0.4s,%1.4s,%2.4s"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
--vpaddq_f64 (float64x2_t a, float64x2_t b)
--{
-- float64x2_t result;
-- __asm__ ("faddp %0.2d,%1.2d,%2.2d"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
- __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
- vpaddq_s8 (int8x16_t a, int8x16_t b)
- {
-@@ -9049,17 +8445,6 @@ vpaddq_u64 (uint64x2_t a, uint64x2_t b)
- return result;
+@@ -80,6 +94,9 @@ void exec_vbsl (void)
+ VDUP(vector2, , uint, u, 16, 4, 0xFFF2);
+ VDUP(vector2, , uint, u, 32, 2, 0xFFFFFFF0);
+ VDUP(vector2, , uint, u, 64, 1, 0xFFFFFFF3);
++#if defined (FP16_SUPPORTED)
++ VDUP(vector2, , float, f, 16, 4, -2.4f); /* -2.4f is 0xC0CD. */
++#endif
+ VDUP(vector2, , float, f, 32, 2, -30.3f);
+ VDUP(vector2, , poly, p, 8, 8, 0xF3);
+ VDUP(vector2, , poly, p, 16, 4, 0xFFF2);
+@@ -94,6 +111,9 @@ void exec_vbsl (void)
+ VDUP(vector2, q, uint, u, 64, 2, 0xFFFFFFF3);
+ VDUP(vector2, q, poly, p, 8, 16, 0xF3);
+ VDUP(vector2, q, poly, p, 16, 8, 0xFFF2);
++#if defined (FP16_SUPPORTED)
++ VDUP(vector2, q, float, f, 16, 8, -2.4f);
++#endif
+ VDUP(vector2, q, float, f, 32, 4, -30.4f);
+
+ VDUP(vector_first, , uint, u, 8, 8, 0xF4);
+@@ -111,10 +131,18 @@ void exec_vbsl (void)
+ TEST_VBSL(uint, , poly, p, 16, 4);
+ TEST_VBSL(uint, q, poly, p, 8, 16);
+ TEST_VBSL(uint, q, poly, p, 16, 8);
++#if defined (FP16_SUPPORTED)
++ TEST_VBSL(uint, , float, f, 16, 4);
++ TEST_VBSL(uint, q, float, f, 16, 8);
++#endif
+ TEST_VBSL(uint, , float, f, 32, 2);
+ TEST_VBSL(uint, q, float, f, 32, 4);
+
++#if defined (FP16_SUPPORTED)
++ CHECK_RESULTS (TEST_MSG, "");
++#else
+ CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
++#endif
}
--__extension__ static __inline float32_t __attribute__ ((__always_inline__))
--vpadds_f32 (float32x2_t a)
--{
-- float32_t result;
-- __asm__ ("faddp %s0,%1.2s"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
- __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
- vqdmulh_n_s16 (int16x4_t a, int16_t b)
- {
-@@ -9679,28 +9064,6 @@ vqrdmulhq_n_s32 (int32x4_t a, int32_t b)
- result; \
- })
+ int main (void)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcage.c
+@@ -11,3 +11,13 @@ VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffffffff, 0xffffffff,
+ VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xffffffff, 0xffffffff };
+ VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xffffffff, 0xffffffff,
+ 0xffffffff, 0xffffffff };
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected, uint, 16, 4) [] = { 0xffff, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected, uint, 16, 8) [] = { 0xffff, 0xffff, 0xffff, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected2, uint, 16, 4) [] = { 0xffff, 0xffff, 0xffff, 0xffff };
++VECT_VAR_DECL (expected2, uint, 16, 8) [] = { 0xffff, 0xffff, 0xffff, 0xffff,
++ 0xffff, 0xffff, 0xffff, 0x0 };
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcageh_f16_1.c
+@@ -0,0 +1,22 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF,
++ 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF,
++ 0xFFFF};
++
++#define TEST_MSG "VCAGEH_F16"
++#define INSN_NAME vcageh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagt.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagt.c
+@@ -11,3 +11,13 @@ VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffffffff, 0xffffffff,
+ VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xffffffff, 0xffffffff };
+ VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xffffffff, 0xffffffff,
+ 0xffffffff, 0xffffffff };
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected, uint, 16, 8) [] = { 0xffff, 0xffff, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected2, uint, 16, 4) [] = { 0xffff, 0xffff, 0xffff, 0xffff };
++VECT_VAR_DECL (expected2, uint, 16, 8) [] = { 0xffff, 0xffff, 0xffff, 0xffff,
++ 0xffff, 0xffff, 0x0, 0x0 };
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcagth_f16_1.c
+@@ -0,0 +1,21 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF,
++ 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0x0};
++
++#define TEST_MSG "VCAGTH_F16"
++#define INSN_NAME vcagth_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcale.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcale.c
+@@ -9,3 +9,13 @@ VECT_VAR_DECL(expected,uint,32,4) [] = { 0x0, 0x0, 0xffffffff, 0xffffffff };
+
+ VECT_VAR_DECL(expected2,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected2,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected, uint, 16, 4) [] = { 0xffff, 0xffff, 0xffff, 0xffff };
++VECT_VAR_DECL (expected, uint, 16, 8) [] = { 0x0, 0x0, 0xffff, 0xffff,
++ 0xffff, 0xffff, 0xffff, 0xffff };
++
++VECT_VAR_DECL (expected2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected2, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0xffff, 0xffff };
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcaleh_f16_1.c
+@@ -0,0 +1,22 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0x0,
++ 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0x0,
++ 0x0, 0xFFFF, 0xFFFF};
++
++#define TEST_MSG "VCALEH_F16"
++#define INSN_NAME vcaleh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalt.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalt.c
+@@ -9,3 +9,13 @@ VECT_VAR_DECL(expected,uint,32,4) [] = { 0x0, 0x0, 0x0, 0xffffffff };
+
+ VECT_VAR_DECL(expected2,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected2,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected, uint, 16, 4) [] = { 0x0, 0xffff, 0xffff, 0xffff };
++VECT_VAR_DECL (expected, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0xffff,
++ 0xffff, 0xffff, 0xffff, 0xffff };
++
++VECT_VAR_DECL (expected2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected2, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0xffff };
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcalth_f16_1.c
+@@ -0,0 +1,22 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0x0,
++ 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0x0,
++ 0x0, 0x0, 0x0};
++
++#define TEST_MSG "VCALTH_F16"
++#define INSN_NAME vcalth_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceq.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceq.c
+@@ -32,6 +32,12 @@ VECT_VAR_DECL(expected_q_uint,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0xffff, 0x0 };
+ VECT_VAR_DECL(expected_q_uint,uint,32,4) [] = { 0x0, 0x0, 0xffffffff, 0x0 };
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0xffff, 0x0, 0x0 };
++VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0, 0xffff, 0x0,
++ 0x0, 0x0, 0x0, 0x0, };
++#endif
++
+ VECT_VAR_DECL(expected_float,uint,32,2) [] = { 0x0, 0xffffffff };
+ VECT_VAR_DECL(expected_q_float,uint,32,4) [] = { 0x0, 0x0, 0xffffffff, 0x0 };
+
+@@ -39,6 +45,18 @@ VECT_VAR_DECL(expected_uint2,uint,32,2) [] = { 0xffffffff, 0x0 };
+ VECT_VAR_DECL(expected_uint3,uint,32,2) [] = { 0x0, 0xffffffff };
+ VECT_VAR_DECL(expected_uint4,uint,32,2) [] = { 0xffffffff, 0x0 };
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_nan2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_inf2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++#endif
++
+ VECT_VAR_DECL(expected_nan,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_mnan,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_nan2,uint,32,2) [] = { 0x0, 0x0 };
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqh_f16_1.c
+@@ -0,0 +1,21 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0};
++
++#define TEST_MSG "VCEQH_F16"
++#define INSN_NAME vceqh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqz_1.c
+@@ -0,0 +1,27 @@
++/* This file tests an intrinsic which currently has only an f16 variant and that
++ is only available when FP16 arithmetic instructions are supported. */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++
++#define INSN_NAME vceqz
++#define TEST_MSG "VCEQZ/VCEQZQ"
++
++#include "cmp_zero_op.inc"
++
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
++#endif
++
++/* Extra FP tests with special values (NaN, ....). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_zero, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vceqzh_f16_1.c
+@@ -0,0 +1,21 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0xFFFF, 0xFFFF, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0};
++
++#define TEST_MSG "VCEQZH_F16"
++#define INSN_NAME vceqzh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcge.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcge.c
+@@ -28,6 +28,14 @@ VECT_VAR_DECL(expected_q_uint,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0, 0x0, 0xffff, 0xffff };
+ VECT_VAR_DECL(expected_q_uint,uint,32,4) [] = { 0x0, 0x0, 0xffffffff, 0xffffffff };
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0xffff, 0xffff, 0xffff };
++VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0,
++ 0xffff, 0xffff,
++ 0xffff, 0xffff,
++ 0xffff, 0xffff };
++#endif
++
+ VECT_VAR_DECL(expected_float,uint,32,2) [] = { 0x0, 0xffffffff };
+ VECT_VAR_DECL(expected_q_float,uint,32,4) [] = { 0x0, 0x0, 0xffffffff, 0xffffffff };
+
+@@ -35,6 +43,20 @@ VECT_VAR_DECL(expected_uint2,uint,32,2) [] = { 0xffffffff, 0xffffffff };
+ VECT_VAR_DECL(expected_uint3,uint,32,2) [] = { 0x0, 0xffffffff };
+ VECT_VAR_DECL(expected_uint4,uint,32,2) [] = { 0xffffffff, 0xffffffff };
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_nan2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_inf2, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++#endif
++
+ VECT_VAR_DECL(expected_nan,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_mnan,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_nan2,uint,32,2) [] = { 0x0, 0x0 };
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgeh_f16_1.c
+@@ -0,0 +1,22 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0x0, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF, 0x0, 0xFFFF,
++ 0x0, 0x0, 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF,
++ 0xFFFF, 0x0};
++
++#define TEST_MSG "VCGEH_F16"
++#define INSN_NAME vcgeh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgez_1.c
+@@ -0,0 +1,30 @@
++/* This file tests an intrinsic which currently has only an f16 variant and that
++ is only available when FP16 arithmetic instructions are supported. */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++
++#define INSN_NAME vcgez
++#define TEST_MSG "VCGEZ/VCGEZQ"
++
++#include "cmp_zero_op.inc"
++
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff,
++ 0xffff, 0xffff,
++ 0xffff, 0xffff };
++#endif
++
++/* Extra FP tests with special values (NaN, ....). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_zero, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgezh_f16_1.c
+@@ -0,0 +1,22 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0,
++ 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF,
++ 0x0, 0xFFFF, 0xFFFF, 0x0};
++
++#define TEST_MSG "VCGEZH_F16"
++#define INSN_NAME vcgezh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgt.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgt.c
+@@ -28,6 +28,14 @@ VECT_VAR_DECL(expected_q_uint,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0xffff };
+ VECT_VAR_DECL(expected_q_uint,uint,32,4) [] = { 0x0, 0x0, 0x0, 0xffffffff };
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0x0, 0xffff, 0xffff };
++VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0,
++ 0x0, 0xffff,
++ 0xffff, 0xffff,
++ 0xffff, 0xffff };
++#endif
++
+ VECT_VAR_DECL(expected_float,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_q_float,uint,32,4) [] = { 0x0, 0x0, 0x0, 0xffffffff };
+
+@@ -35,6 +43,19 @@ VECT_VAR_DECL(expected_uint2,uint,32,2) [] = { 0x0, 0xffffffff };
+ VECT_VAR_DECL(expected_uint3,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_uint4,uint,32,2) [] = { 0x0, 0xffffffff };
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_nan2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_inf2, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++#endif
++
+ VECT_VAR_DECL(expected_nan,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_mnan,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_nan2,uint,32,2) [] = { 0x0, 0x0 };
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgth_f16_1.c
+@@ -0,0 +1,22 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0x0, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF, 0x0, 0xFFFF,
++ 0x0, 0x0, 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF,
++ 0xFFFF, 0x0};
++
++#define TEST_MSG "VCGTH_F16"
++#define INSN_NAME vcgth_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtz_1.c
+@@ -0,0 +1,28 @@
++/* This file tests an intrinsic which currently has only an f16 variant and that
++ is only available when FP16 arithmetic instructions are supported. */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++
++#define INSN_NAME vcgtz
++#define TEST_MSG "VCGTZ/VCGTZQ"
++
++#include "cmp_zero_op.inc"
++
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff,
++ 0xffff, 0xffff,
++ 0xffff, 0xffff };
++#endif
++
++/* Extra FP tests with special values (NaN, ....). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_zero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcgtzh_f16_1.c
+@@ -0,0 +1,22 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0x0, 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0, 0xFFFF,
++ 0x0, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0x0,
++ 0xFFFF, 0xFFFF, 0x0};
++
++#define TEST_MSG "VCGTZH_F16"
++#define INSN_NAME vcgtzh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcle.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcle.c
+@@ -31,6 +31,14 @@ VECT_VAR_DECL(expected_q_uint,uint,16,8) [] = { 0xffff, 0xffff, 0xffff, 0xffff,
+ VECT_VAR_DECL(expected_q_uint,uint,32,4) [] = { 0xffffffff, 0xffffffff,
+ 0xffffffff, 0x0 };
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0xffff, 0xffff, 0x0, 0x0 };
++VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0xffff, 0xffff,
++ 0xffff, 0x0,
++ 0x0, 0x0,
++ 0x0, 0x0 };
++#endif
++
+ VECT_VAR_DECL(expected_float,uint,32,2) [] = { 0xffffffff, 0xffffffff };
+ VECT_VAR_DECL(expected_q_float,uint,32,4) [] = { 0xffffffff, 0xffffffff,
+ 0xffffffff, 0x0 };
+@@ -39,6 +47,20 @@ VECT_VAR_DECL(expected_uint2,uint,32,2) [] = { 0xffffffff, 0x0 };
+ VECT_VAR_DECL(expected_uint3,uint,32,2) [] = { 0xffffffff, 0xffffffff };
+ VECT_VAR_DECL(expected_uint4,uint,32,2) [] = { 0xffffffff, 0x0 };
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_nan2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_inf2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++#endif
++
+ VECT_VAR_DECL(expected_nan,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_mnan,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_nan2,uint,32,2) [] = { 0x0, 0x0 };
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcleh_f16_1.c
+@@ -0,0 +1,22 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0x0,
++ 0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF, 0x0, 0x0,
++ 0xFFFF};
++
++#define TEST_MSG "VCLEH_F16"
++#define INSN_NAME vcleh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclez_1.c
+@@ -0,0 +1,29 @@
++/* This file tests an intrinsic which currently has only an f16 variant and that
++ is only available when FP16 arithmetic instructions are supported. */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++
++#define INSN_NAME vclez
++#define TEST_MSG "VCLEZ/VCLEZQ"
++
++#include "cmp_zero_op.inc"
++
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0 };
++#endif
++
++/* Extra FP tests with special values (NaN, ....). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_zero, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclezh_f16_1.c
+@@ -0,0 +1,21 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0xFFFF, 0xFFFF, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0x0, 0xFFFF,
++ 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF};
++
++#define TEST_MSG "VCLEZH_F16"
++#define INSN_NAME vclezh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclt.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclt.c
+@@ -30,6 +30,14 @@ VECT_VAR_DECL(expected_q_uint,uint,16,8) [] = { 0xffff, 0xffff, 0xffff, 0xffff,
+ VECT_VAR_DECL(expected_q_uint,uint,32,4) [] = { 0xffffffff, 0xffffffff,
+ 0x0, 0x0 };
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0xffff, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0xffff, 0xffff,
++ 0x0, 0x0,
++ 0x0, 0x0,
++ 0x0, 0x0 };
++#endif
++
+ VECT_VAR_DECL(expected_float,uint,32,2) [] = { 0xffffffff, 0x0 };
+ VECT_VAR_DECL(expected_q_float,uint,32,4) [] = { 0xffffffff, 0xffffffff,
+ 0x0, 0x0 };
+@@ -38,6 +46,19 @@ VECT_VAR_DECL(expected_uint2,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_uint3,uint,32,2) [] = { 0xffffffff, 0x0 };
+ VECT_VAR_DECL(expected_uint4,uint,32,2) [] = { 0x0, 0x0 };
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_nan2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_inf2, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++#endif
++
+ VECT_VAR_DECL(expected_nan,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_mnan,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_nan2,uint,32,2) [] = { 0x0, 0x0 };
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vclth_f16_1.c
+@@ -0,0 +1,22 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0x0,
++ 0xFFFF, 0xFFFF, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF, 0x0, 0x0,
++ 0xFFFF};
++
++#define TEST_MSG "VCLTH_F16"
++#define INSN_NAME vclth_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltz_1.c
+@@ -0,0 +1,27 @@
++/* This file tests an intrinsic which currently has only an f16 variant and that
++ is only available when FP16 arithmetic instructions are supported. */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++
++#define INSN_NAME vcltz
++#define TEST_MSG "VCLTZ/VCLTZQ"
++
++#include "cmp_zero_op.inc"
++
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_float, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_q_float, uint, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0 };
++#endif
++
++/* Extra FP tests with special values (NaN, ....). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected_nan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mnan, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_inf, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++
++VECT_VAR_DECL (expected_minf, uint, 16, 4) [] = { 0xffff, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL (expected_zero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL (expected_mzero, uint, 16, 4) [] = { 0x0, 0x0, 0x0, 0x0 };
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcltzh_f16_1.c
+@@ -0,0 +1,21 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] = { 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0x0, 0xFFFF,
++ 0x0, 0x0, 0x0, 0x0, 0x0, 0xFFFF, 0x0, 0x0, 0xFFFF};
++
++#define TEST_MSG "VCltZH_F16"
++#define INSN_NAME vcltzh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt.c
+@@ -4,36 +4,99 @@
+ #include <math.h>
+
+ /* Expected results for vcvt. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_s, hfloat, 16, 4) [] =
++{ 0xcc00, 0xcb80, 0xcb00, 0xca80 };
++VECT_VAR_DECL(expected_u, hfloat, 16, 4) [] =
++{ 0x7c00, 0x7c00, 0x7c00, 0x7c00, };
++VECT_VAR_DECL(expected_s, hfloat, 16, 8) [] =
++{ 0xcc00, 0xcb80, 0xcb00, 0xca80,
++ 0xca00, 0xc980, 0xc900, 0xc880 };
++VECT_VAR_DECL(expected_u, hfloat, 16, 8) [] =
++{ 0x7c00, 0x7c00, 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00, 0x7c00, 0x7c00, };
++#endif
+ VECT_VAR_DECL(expected_s,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
+ VECT_VAR_DECL(expected_u,hfloat,32,2) [] = { 0x4f800000, 0x4f800000 };
+ VECT_VAR_DECL(expected_s,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
+- 0xc1600000, 0xc1500000 };
++ 0xc1600000, 0xc1500000 };
+ VECT_VAR_DECL(expected_u,hfloat,32,4) [] = { 0x4f800000, 0x4f800000,
+- 0x4f800000, 0x4f800000 };
++ 0x4f800000, 0x4f800000 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, int, 16, 4) [] = { 0xfff1, 0x5, 0xfff1, 0x5 };
++VECT_VAR_DECL(expected, uint, 16, 4) [] = { 0x0, 0x5, 0x0, 0x5 };
++VECT_VAR_DECL(expected, int, 16, 8) [] = { 0x0, 0x0, 0xf, 0xfff1,
++ 0x0, 0x0, 0xf, 0xfff1 };
++VECT_VAR_DECL(expected, uint, 16, 8) [] = { 0x0, 0x0, 0xf, 0x0,
++ 0x0, 0x0, 0xf, 0x0 };
++#endif
+ VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff1, 0x5 };
+ VECT_VAR_DECL(expected,uint,32,2) [] = { 0x0, 0x5 };
+ VECT_VAR_DECL(expected,int,32,4) [] = { 0x0, 0x0, 0xf, 0xfffffff1 };
+ VECT_VAR_DECL(expected,uint,32,4) [] = { 0x0, 0x0, 0xf, 0x0 };
+
+ /* Expected results for vcvt_n. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_vcvt_n_s, hfloat, 16, 4) [] = { 0xc400, 0xc380,
++ 0xc300, 0xc280 };
++VECT_VAR_DECL(expected_vcvt_n_u, hfloat, 16, 4) [] = { 0x6000, 0x6000,
++ 0x6000, 0x6000 };
++VECT_VAR_DECL(expected_vcvt_n_s, hfloat, 16, 8) [] = { 0xb000, 0xaf80,
++ 0xaf00, 0xae80,
++ 0xae00, 0xad80,
++ 0xad00, 0xac80 };
++VECT_VAR_DECL(expected_vcvt_n_u, hfloat, 16, 8) [] = { 0x4c00, 0x4c00,
++ 0x4c00, 0x4c00,
++ 0x4c00, 0x4c00,
++ 0x4c00, 0x4c00 };
++#endif
+ VECT_VAR_DECL(expected_vcvt_n_s,hfloat,32,2) [] = { 0xc0800000, 0xc0700000 };
+ VECT_VAR_DECL(expected_vcvt_n_u,hfloat,32,2) [] = { 0x4c000000, 0x4c000000 };
+ VECT_VAR_DECL(expected_vcvt_n_s,hfloat,32,4) [] = { 0xb2800000, 0xb2700000,
+ 0xb2600000, 0xb2500000 };
+ VECT_VAR_DECL(expected_vcvt_n_u,hfloat,32,4) [] = { 0x49800000, 0x49800000,
+ 0x49800000, 0x49800000 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_vcvt_n, int, 16, 4) [] = { 0xffc3, 0x15,
++ 0xffc3, 0x15 };
++VECT_VAR_DECL(expected_vcvt_n, uint, 16, 4) [] = { 0x0, 0x2a6, 0x0, 0x2a6 };
++VECT_VAR_DECL(expected_vcvt_n, int, 16, 8) [] = { 0x0, 0x0, 0x78f, 0xf871,
++ 0x0, 0x0, 0x78f, 0xf871 };
++VECT_VAR_DECL(expected_vcvt_n, uint, 16, 8) [] = { 0x0, 0x0, 0xf1e0, 0x0,
++ 0x0, 0x0, 0xf1e0, 0x0 };
++#endif
+ VECT_VAR_DECL(expected_vcvt_n,int,32,2) [] = { 0xff0b3333, 0x54cccd };
+ VECT_VAR_DECL(expected_vcvt_n,uint,32,2) [] = { 0x0, 0x15 };
+ VECT_VAR_DECL(expected_vcvt_n,int,32,4) [] = { 0x0, 0x0, 0x1e3d7, 0xfffe1c29 };
+ VECT_VAR_DECL(expected_vcvt_n,uint,32,4) [] = { 0x0, 0x0, 0x1e, 0x0 };
+
+ /* Expected results for vcvt with rounding. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_rounding, int, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
++VECT_VAR_DECL(expected_rounding, uint, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
++VECT_VAR_DECL(expected_rounding, int, 16, 8) [] = { 0x7d, 0x7d, 0x7d, 0x7d,
++ 0x7d, 0x7d, 0x7d, 0x7d };
++VECT_VAR_DECL(expected_rounding, uint, 16, 8) [] = { 0x7d, 0x7d, 0x7d, 0x7d,
++ 0x7d, 0x7d, 0x7d, 0x7d };
++#endif
+ VECT_VAR_DECL(expected_rounding,int,32,2) [] = { 0xa, 0xa };
+ VECT_VAR_DECL(expected_rounding,uint,32,2) [] = { 0xa, 0xa };
+ VECT_VAR_DECL(expected_rounding,int,32,4) [] = { 0x7d, 0x7d, 0x7d, 0x7d };
+ VECT_VAR_DECL(expected_rounding,uint,32,4) [] = { 0x7d, 0x7d, 0x7d, 0x7d };
+
+ /* Expected results for vcvt_n with rounding. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_vcvt_n_rounding, int, 16, 4) [] =
++{ 0x533, 0x533, 0x533, 0x533 };
++VECT_VAR_DECL(expected_vcvt_n_rounding, uint, 16, 4) [] =
++{ 0x533, 0x533, 0x533, 0x533 };
++VECT_VAR_DECL(expected_vcvt_n_rounding, int, 16, 8) [] =
++{ 0x7fff, 0x7fff, 0x7fff, 0x7fff,
++ 0x7fff, 0x7fff, 0x7fff, 0x7fff };
++VECT_VAR_DECL(expected_vcvt_n_rounding, uint, 16, 8) [] =
++{ 0xffff, 0xffff, 0xffff, 0xffff,
++ 0xffff, 0xffff, 0xffff, 0xffff };
++#endif
+ VECT_VAR_DECL(expected_vcvt_n_rounding,int,32,2) [] = { 0xa66666, 0xa66666 };
+ VECT_VAR_DECL(expected_vcvt_n_rounding,uint,32,2) [] = { 0xa66666, 0xa66666 };
+ VECT_VAR_DECL(expected_vcvt_n_rounding,int,32,4) [] = { 0xfbccc, 0xfbccc,
+@@ -42,11 +105,17 @@ VECT_VAR_DECL(expected_vcvt_n_rounding,uint,32,4) [] = { 0xfbccc, 0xfbccc,
+ 0xfbccc, 0xfbccc };
+
+ /* Expected results for vcvt_n with saturation. */
+-VECT_VAR_DECL(expected_vcvt_n_saturation,int,32,2) [] = { 0x7fffffff,
+- 0x7fffffff };
+-VECT_VAR_DECL(expected_vcvt_n_saturation,int,32,4) [] = { 0x7fffffff,
+- 0x7fffffff,
+- 0x7fffffff, 0x7fffffff };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_vcvt_n_saturation, int, 16, 4) [] =
++{ 0x533, 0x533, 0x533, 0x533 };
++VECT_VAR_DECL(expected_vcvt_n_saturation, int, 16, 8) [] =
++{ 0x7fff, 0x7fff, 0x7fff, 0x7fff,
++ 0x7fff, 0x7fff, 0x7fff, 0x7fff };
++#endif
++VECT_VAR_DECL(expected_vcvt_n_saturation,int,32,2) [] =
++{ 0x7fffffff, 0x7fffffff };
++VECT_VAR_DECL(expected_vcvt_n_saturation,int,32,4) [] =
++{ 0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff };
--__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
--vrsqrte_f32 (float32x2_t a)
--{
-- float32x2_t result;
-- __asm__ ("frsqrte %0.2s,%1.2s"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
--vrsqrte_f64 (float64x1_t a)
--{
-- float64x1_t result;
-- __asm__ ("frsqrte %d0,%d1"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
- __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
- vrsqrte_u32 (uint32x2_t a)
- {
-@@ -9712,39 +9075,6 @@ vrsqrte_u32 (uint32x2_t a)
- return result;
- }
+ #define TEST_MSG "VCVT/VCVTQ"
+ void exec_vcvt (void)
+@@ -89,11 +158,26 @@ void exec_vcvt (void)
--__extension__ static __inline float64_t __attribute__ ((__always_inline__))
--vrsqrted_f64 (float64_t a)
--{
-- float64_t result;
-- __asm__ ("frsqrte %d0,%d1"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
--vrsqrteq_f32 (float32x4_t a)
--{
-- float32x4_t result;
-- __asm__ ("frsqrte %0.4s,%1.4s"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
--vrsqrteq_f64 (float64x2_t a)
--{
-- float64x2_t result;
-- __asm__ ("frsqrte %0.2d,%1.2d"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
- __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
- vrsqrteq_u32 (uint32x4_t a)
- {
-@@ -9756,72 +9086,6 @@ vrsqrteq_u32 (uint32x4_t a)
- return result;
- }
+ /* Initialize input "vector" from "buffer". */
+ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector, buffer, , float, f, 32, 2);
+ VLOAD(vector, buffer, q, float, f, 32, 4);
--__extension__ static __inline float32_t __attribute__ ((__always_inline__))
--vrsqrtes_f32 (float32_t a)
--{
-- float32_t result;
-- __asm__ ("frsqrte %s0,%s1"
-- : "=w"(result)
-- : "w"(a)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
--vrsqrts_f32 (float32x2_t a, float32x2_t b)
--{
-- float32x2_t result;
-- __asm__ ("frsqrts %0.2s,%1.2s,%2.2s"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float64_t __attribute__ ((__always_inline__))
--vrsqrtsd_f64 (float64_t a, float64_t b)
--{
-- float64_t result;
-- __asm__ ("frsqrts %d0,%d1,%d2"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
--vrsqrtsq_f32 (float32x4_t a, float32x4_t b)
--{
-- float32x4_t result;
-- __asm__ ("frsqrts %0.4s,%1.4s,%2.4s"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
--vrsqrtsq_f64 (float64x2_t a, float64x2_t b)
--{
-- float64x2_t result;
-- __asm__ ("frsqrts %0.2d,%1.2d,%2.2d"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
--__extension__ static __inline float32_t __attribute__ ((__always_inline__))
--vrsqrtss_f32 (float32_t a, float32_t b)
--{
-- float32_t result;
-- __asm__ ("frsqrts %s0,%s1,%s2"
-- : "=w"(result)
-- : "w"(a), "w"(b)
-- : /* No clobbers */);
-- return result;
--}
--
- #define vshrn_high_n_s16(a, b, c) \
- __extension__ \
- ({ \
-@@ -10872,6 +10136,45 @@ vtbx2_p8 (poly8x8_t r, poly8x8x2_t tab, uint8x8_t idx)
+ /* Make sure some elements have a fractional part, to exercise
+ integer conversions. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VSET_LANE(vector, , float, f, 16, 4, 0, -15.3f);
++ VSET_LANE(vector, , float, f, 16, 4, 1, 5.3f);
++ VSET_LANE(vector, , float, f, 16, 4, 2, -15.3f);
++ VSET_LANE(vector, , float, f, 16, 4, 3, 5.3f);
++ VSET_LANE(vector, q, float, f, 16, 8, 4, -15.3f);
++ VSET_LANE(vector, q, float, f, 16, 8, 5, 5.3f);
++ VSET_LANE(vector, q, float, f, 16, 8, 6, -15.3f);
++ VSET_LANE(vector, q, float, f, 16, 8, 7, 5.3f);
++#endif
++
+ VSET_LANE(vector, , float, f, 32, 2, 0, -15.3f);
+ VSET_LANE(vector, , float, f, 32, 2, 1, 5.3f);
+ VSET_LANE(vector, q, float, f, 32, 4, 2, -15.3f);
+@@ -103,23 +187,55 @@ void exec_vcvt (void)
+ before overwriting them. */
+ #define TEST_MSG2 ""
- /* Start of optimal implementations in approved order. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt_f16_xx. */
++ TEST_VCVT_FP(, float, f, 16, 4, int, s, expected_s);
++ TEST_VCVT_FP(, float, f, 16, 4, uint, u, expected_u);
++#endif
+ /* vcvt_f32_xx. */
+ TEST_VCVT_FP(, float, f, 32, 2, int, s, expected_s);
+ TEST_VCVT_FP(, float, f, 32, 2, uint, u, expected_u);
-+/* vabd. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvtq_f16_xx. */
++ TEST_VCVT_FP(q, float, f, 16, 8, int, s, expected_s);
++ TEST_VCVT_FP(q, float, f, 16, 8, uint, u, expected_u);
++#endif
+ /* vcvtq_f32_xx. */
+ TEST_VCVT_FP(q, float, f, 32, 4, int, s, expected_s);
+ TEST_VCVT_FP(q, float, f, 32, 4, uint, u, expected_u);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt_xx_f16. */
++ TEST_VCVT(, int, s, 16, 4, float, f, expected);
++ TEST_VCVT(, uint, u, 16, 4, float, f, expected);
++#endif
+ /* vcvt_xx_f32. */
+ TEST_VCVT(, int, s, 32, 2, float, f, expected);
+ TEST_VCVT(, uint, u, 32, 2, float, f, expected);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VSET_LANE(vector, q, float, f, 16, 8, 0, 0.0f);
++ VSET_LANE(vector, q, float, f, 16, 8, 1, -0.0f);
++ VSET_LANE(vector, q, float, f, 16, 8, 2, 15.12f);
++ VSET_LANE(vector, q, float, f, 16, 8, 3, -15.12f);
++ VSET_LANE(vector, q, float, f, 16, 8, 4, 0.0f);
++ VSET_LANE(vector, q, float, f, 16, 8, 5, -0.0f);
++ VSET_LANE(vector, q, float, f, 16, 8, 6, 15.12f);
++ VSET_LANE(vector, q, float, f, 16, 8, 7, -15.12f);
++#endif
++
+ VSET_LANE(vector, q, float, f, 32, 4, 0, 0.0f);
+ VSET_LANE(vector, q, float, f, 32, 4, 1, -0.0f);
+ VSET_LANE(vector, q, float, f, 32, 4, 2, 15.12f);
+ VSET_LANE(vector, q, float, f, 32, 4, 3, -15.12f);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvtq_xx_f16. */
++ TEST_VCVT(q, int, s, 16, 8, float, f, expected);
++ TEST_VCVT(q, uint, u, 16, 8, float, f, expected);
++#endif
++
+ /* vcvtq_xx_f32. */
+ TEST_VCVT(q, int, s, 32, 4, float, f, expected);
+ TEST_VCVT(q, uint, u, 32, 4, float, f, expected);
+@@ -129,18 +245,38 @@ void exec_vcvt (void)
+ #undef TEST_MSG
+ #define TEST_MSG "VCVT_N/VCVTQ_N"
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt_n_f16_xx. */
++ TEST_VCVT_N_FP(, float, f, 16, 4, int, s, 2, expected_vcvt_n_s);
++ TEST_VCVT_N_FP(, float, f, 16, 4, uint, u, 7, expected_vcvt_n_u);
++#endif
+ /* vcvt_n_f32_xx. */
+ TEST_VCVT_N_FP(, float, f, 32, 2, int, s, 2, expected_vcvt_n_s);
+ TEST_VCVT_N_FP(, float, f, 32, 2, uint, u, 7, expected_vcvt_n_u);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvtq_n_f16_xx. */
++ TEST_VCVT_N_FP(q, float, f, 16, 8, int, s, 7, expected_vcvt_n_s);
++ TEST_VCVT_N_FP(q, float, f, 16, 8, uint, u, 12, expected_vcvt_n_u);
++#endif
+ /* vcvtq_n_f32_xx. */
+ TEST_VCVT_N_FP(q, float, f, 32, 4, int, s, 30, expected_vcvt_n_s);
+ TEST_VCVT_N_FP(q, float, f, 32, 4, uint, u, 12, expected_vcvt_n_u);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt_n_xx_f16. */
++ TEST_VCVT_N(, int, s, 16, 4, float, f, 2, expected_vcvt_n);
++ TEST_VCVT_N(, uint, u, 16, 4, float, f, 7, expected_vcvt_n);
++#endif
+ /* vcvt_n_xx_f32. */
+ TEST_VCVT_N(, int, s, 32, 2, float, f, 20, expected_vcvt_n);
+ TEST_VCVT_N(, uint, u, 32, 2, float, f, 2, expected_vcvt_n);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvtq_n_xx_f16. */
++ TEST_VCVT_N(q, int, s, 16, 8, float, f, 7, expected_vcvt_n);
++ TEST_VCVT_N(q, uint, u, 16, 8, float, f, 12, expected_vcvt_n);
++#endif
+ /* vcvtq_n_xx_f32. */
+ TEST_VCVT_N(q, int, s, 32, 4, float, f, 13, expected_vcvt_n);
+ TEST_VCVT_N(q, uint, u, 32, 4, float, f, 1, expected_vcvt_n);
+@@ -150,20 +286,49 @@ void exec_vcvt (void)
+ #define TEST_MSG "VCVT/VCVTQ"
+ #undef TEST_MSG2
+ #define TEST_MSG2 "(check rounding)"
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, 10.4f);
++ VDUP(vector, q, float, f, 16, 8, 125.9f);
++#endif
+ VDUP(vector, , float, f, 32, 2, 10.4f);
+ VDUP(vector, q, float, f, 32, 4, 125.9f);
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt_xx_f16. */
++ TEST_VCVT(, int, s, 16, 4, float, f, expected_rounding);
++ TEST_VCVT(, uint, u, 16, 4, float, f, expected_rounding);
++#endif
+ /* vcvt_xx_f32. */
+ TEST_VCVT(, int, s, 32, 2, float, f, expected_rounding);
+ TEST_VCVT(, uint, u, 32, 2, float, f, expected_rounding);
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvtq_xx_f16. */
++ TEST_VCVT(q, int, s, 16, 8, float, f, expected_rounding);
++ TEST_VCVT(q, uint, u, 16, 8, float, f, expected_rounding);
++#endif
+ /* vcvtq_xx_f32. */
+ TEST_VCVT(q, int, s, 32, 4, float, f, expected_rounding);
+ TEST_VCVT(q, uint, u, 32, 4, float, f, expected_rounding);
+
+ #undef TEST_MSG
+ #define TEST_MSG "VCVT_N/VCVTQ_N"
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt_n_xx_f16. */
++ TEST_VCVT_N(, int, s, 16, 4, float, f, 7, expected_vcvt_n_rounding);
++ TEST_VCVT_N(, uint, u, 16, 4, float, f, 7, expected_vcvt_n_rounding);
++#endif
+ /* vcvt_n_xx_f32. */
+ TEST_VCVT_N(, int, s, 32, 2, float, f, 20, expected_vcvt_n_rounding);
+ TEST_VCVT_N(, uint, u, 32, 2, float, f, 20, expected_vcvt_n_rounding);
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvtq_n_xx_f16. */
++ TEST_VCVT_N(q, int, s, 16, 8, float, f, 13, expected_vcvt_n_rounding);
++ TEST_VCVT_N(q, uint, u, 16, 8, float, f, 13, expected_vcvt_n_rounding);
++#endif
+ /* vcvtq_n_xx_f32. */
+ TEST_VCVT_N(q, int, s, 32, 4, float, f, 13, expected_vcvt_n_rounding);
+ TEST_VCVT_N(q, uint, u, 32, 4, float, f, 13, expected_vcvt_n_rounding);
+@@ -172,8 +337,18 @@ void exec_vcvt (void)
+ #define TEST_MSG "VCVT_N/VCVTQ_N"
+ #undef TEST_MSG2
+ #define TEST_MSG2 "(check saturation)"
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt_n_xx_f16. */
++ TEST_VCVT_N(, int, s, 16, 4, float, f, 7, expected_vcvt_n_saturation);
++#endif
+ /* vcvt_n_xx_f32. */
+ TEST_VCVT_N(, int, s, 32, 2, float, f, 31, expected_vcvt_n_saturation);
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvtq_n_xx_f16. */
++ TEST_VCVT_N(q, int, s, 16, 8, float, f, 13, expected_vcvt_n_saturation);
++#endif
+ /* vcvtq_n_xx_f32. */
+ TEST_VCVT_N(q, int, s, 32, 4, float, f, 31, expected_vcvt_n_saturation);
+ }
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtX.inc
+@@ -0,0 +1,113 @@
++/* Template file for VCVT operator validation.
++
++ This file is meant to be included by the relevant test files, which
++ have to define the intrinsic family to test. If a given intrinsic
++ supports variants which are not supported by all the other vcvt
++ operators, these can be tested by providing a definition for
++ EXTRA_TESTS.
++
++ This file is only used for VCVT? tests, which currently have only f16 to
++ integer variants. It is based on vcvt.c. */
++
++#define FNNAME1(NAME) exec_ ## NAME
++#define FNNAME(NAME) FNNAME1 (NAME)
++
++void FNNAME (INSN_NAME) (void)
++{
++ int i;
++
++ /* Basic test: y=vcvt(x), then store the result. */
++#define TEST_VCVT1(INSN, Q, T1, T2, W, N, TS1, TS2, EXP) \
++ VECT_VAR(vector_res, T1, W, N) = \
++ INSN##Q##_##T2##W##_##TS2##W(VECT_VAR(vector, TS1, W, N)); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \
++ VECT_VAR(vector_res, T1, W, N)); \
++ CHECK(TEST_MSG, T1, W, N, PRIx##W, EXP, TEST_MSG2);
++
++#define TEST_VCVT(INSN, Q, T1, T2, W, N, TS1, TS2, EXP) \
++ TEST_VCVT1 (INSN, Q, T1, T2, W, N, TS1, TS2, EXP)
++
++ DECL_VARIABLE_ALL_VARIANTS(vector);
++ DECL_VARIABLE_ALL_VARIANTS(vector_res);
++
++ clean_results ();
++
++ /* Initialize input "vector" from "buffer". */
++ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
++
++ /* Make sure some elements have a fractional part, to exercise
++ integer conversions. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VSET_LANE(vector, , float, f, 16, 4, 0, -15.3f);
++ VSET_LANE(vector, , float, f, 16, 4, 1, 5.3f);
++ VSET_LANE(vector, , float, f, 16, 4, 2, -15.3f);
++ VSET_LANE(vector, , float, f, 16, 4, 3, 5.3f);
++ VSET_LANE(vector, q, float, f, 16, 8, 4, -15.3f);
++ VSET_LANE(vector, q, float, f, 16, 8, 5, 5.3f);
++ VSET_LANE(vector, q, float, f, 16, 8, 6, -15.3f);
++ VSET_LANE(vector, q, float, f, 16, 8, 7, 5.3f);
++#endif
++
++ /* The same result buffers are used multiple times, so we check them
++ before overwriting them. */
++#define TEST_MSG2 ""
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt?_xx_f16. */
++ TEST_VCVT(INSN_NAME, , int, s, 16, 4, float, f, expected);
++ TEST_VCVT(INSN_NAME, , uint, u, 16, 4, float, f, expected);
++#endif
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VSET_LANE(vector, q, float, f, 16, 8, 0, 0.0f);
++ VSET_LANE(vector, q, float, f, 16, 8, 1, -0.0f);
++ VSET_LANE(vector, q, float, f, 16, 8, 2, 15.12f);
++ VSET_LANE(vector, q, float, f, 16, 8, 3, -15.12f);
++ VSET_LANE(vector, q, float, f, 16, 8, 4, 0.0f);
++ VSET_LANE(vector, q, float, f, 16, 8, 5, -0.0f);
++ VSET_LANE(vector, q, float, f, 16, 8, 6, 15.12f);
++ VSET_LANE(vector, q, float, f, 16, 8, 7, -15.12f);
++#endif
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt?q_xx_f16. */
++ TEST_VCVT(INSN_NAME, q, int, s, 16, 8, float, f, expected);
++ TEST_VCVT(INSN_NAME, q, uint, u, 16, 8, float, f, expected);
++#endif
++
++ /* Check rounding. */
++#undef TEST_MSG2
++#define TEST_MSG2 "(check rounding)"
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, 10.4f);
++ VDUP(vector, q, float, f, 16, 8, 125.9f);
++#endif
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt?_xx_f16. */
++ TEST_VCVT(INSN_NAME, , int, s, 16, 4, float, f, expected_rounding);
++ TEST_VCVT(INSN_NAME, , uint, u, 16, 4, float, f, expected_rounding);
++#endif
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ /* vcvt?q_xx_f16. */
++ TEST_VCVT(INSN_NAME, q, int, s, 16, 8, float, f, expected_rounding);
++ TEST_VCVT(INSN_NAME, q, uint, u, 16, 8, float, f, expected_rounding);
++#endif
++
++#ifdef EXTRA_TESTS
++ EXTRA_TESTS();
++#endif
++}
++
++int
++main (void)
++{
++ FNNAME (INSN_NAME) ();
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvta_1.c
+@@ -0,0 +1,33 @@
++/* This file tests an intrinsic which currently has only an f16 variant and that
++ is only available when FP16 arithmetic instructions are supported. */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++#include <math.h>
++
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, int, 16, 4) [] = { 0xfff1, 0x5, 0xfff1, 0x5 };
++VECT_VAR_DECL(expected, uint, 16, 4) [] = { 0x0, 0x5, 0x0, 0x5 };
++VECT_VAR_DECL(expected, int, 16, 8) [] = { 0x0, 0x0, 0xf, 0xfff1,
++ 0x0, 0x0, 0xf, 0xfff1 };
++VECT_VAR_DECL(expected, uint, 16, 8) [] = { 0x0, 0x0, 0xf, 0x0,
++ 0x0, 0x0, 0xf, 0x0 };
++#endif
++
++/* Expected results with rounding. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_rounding, int, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
++VECT_VAR_DECL(expected_rounding, uint, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
++VECT_VAR_DECL(expected_rounding, int, 16, 8) [] = { 0x7e, 0x7e, 0x7e, 0x7e,
++ 0x7e, 0x7e, 0x7e, 0x7e };
++VECT_VAR_DECL(expected_rounding, uint, 16, 8) [] = { 0x7e, 0x7e, 0x7e, 0x7e,
++ 0x7e, 0x7e, 0x7e, 0x7e };
++#endif
++
++#define TEST_MSG "VCVTA/VCVTAQ"
++#define INSN_NAME vcvta
++
++#include "vcvtX.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s16_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
++int16_t expected[] = { 124, -57, 1, 25, -64, 169, -4, 77 };
++
++#define TEST_MSG "VCVTAH_S16_F16"
++#define INSN_NAME vcvtah_s16_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c
+@@ -0,0 +1,53 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
++{
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
++
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected[] =
++{
++ 0x00000000,
++ 0x00000000,
++ 0x0000007b,
++ 0xfffffdc8,
++ 0xffffffdd,
++ 0x00000400,
++ 0x00000297,
++ 0x000000a9,
++ 0xfffffffb,
++ 0x0000004d,
++ 0xffffff6f,
++ 0xffffffc7,
++ 0xfffffff0,
++ 0xfffffff1,
++ 0xfffffff2,
++ 0xfffffff3
++};
++
++#define TEST_MSG "VCVTAH_S32_F16"
++#define INSN_NAME vcvtah_s32_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int32_t
++#define OUTPUT_TYPE_SIZE 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_s64_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
++int64_t expected[] = { 124, -57, 1, 25, -64, 169, -4, 77 };
++
++#define TEST_MSG "VCVTAH_S64_F16"
++#define INSN_NAME vcvtah_s64_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int64_t
++#define OUTPUT_TYPE_SIZE 64
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u16_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
++uint16_t expected[] = { 124, 57, 1, 25, 64, 169, 4, 77 };
++
++#define TEST_MSG "VCVTAH_u16_F16"
++#define INSN_NAME vcvtah_u16_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c
+@@ -0,0 +1,53 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
++{
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
++
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected[] =
++{
++ 0x00000000,
++ 0x00000000,
++ 0x0000007b,
++ 0x00000000,
++ 0x00000000,
++ 0x00000400,
++ 0x00000297,
++ 0x000000a9,
++ 0x00000000,
++ 0x0000004d,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000
++};
++
++#define TEST_MSG "VCVTAH_U32_F16"
++#define INSN_NAME vcvtah_u32_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint32_t
++#define OUTPUT_TYPE_SIZE 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtah_u64_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
++uint64_t expected[] = { 124, 57, 1, 25, 64, 169, 4, 77 };
++
++#define TEST_MSG "VCVTAH_u64_F16"
++#define INSN_NAME vcvtah_u64_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint64_t
++#define OUTPUT_TYPE_SIZE 64
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s16_1.c
+@@ -0,0 +1,25 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++int16_t input[] = { 123, -567, 0, 1024, -63, 169, -4, 77 };
++uint16_t expected[] = { 0x57B0 /* 123.0. */, 0xE06E /* -567.0. */,
++ 0x0000 /* 0.0. */, 0x6400 /* 1024. */,
++ 0xD3E0 /* -63. */, 0x5948 /* 169. */,
++ 0xC400 /* -4. */, 0x54D0 /* 77. */ };
++
++#define TEST_MSG "VCVTH_F16_S16"
++#define INSN_NAME vcvth_f16_s16
++
++#define EXPECTED expected
++
++#define INPUT input
++#define INPUT_TYPE int16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c
+@@ -0,0 +1,52 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++uint32_t input[] =
++{
++ 0, -0,
++ 123, -567,
++ -34, 1024,
++ -63, 169,
++ -4, 77,
++ -144, -56,
++ -16, -15,
++ -14, -13,
++};
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x57b0 /* 123.000000 */,
++ 0xe06e /* -567.000000 */,
++ 0xd040 /* -34.000000 */,
++ 0x6400 /* 1024.000000 */,
++ 0xd3e0 /* -63.000000 */,
++ 0x5948 /* 169.000000 */,
++ 0xc400 /* -4.000000 */,
++ 0x54d0 /* 77.000000 */,
++ 0xd880 /* -144.000000 */,
++ 0xd300 /* -56.000000 */,
++ 0xcc00 /* -16.000000 */,
++ 0xcb80 /* -15.000000 */,
++ 0xcb00 /* -14.000000 */,
++ 0xca80 /* -13.000000 */
++};
++
++#define TEST_MSG "VCVTH_F16_S32"
++#define INSN_NAME vcvth_f16_s32
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE uint32_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s64_1.c
+@@ -0,0 +1,25 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++int64_t input[] = { 123, -567, 0, 1024, -63, 169, -4, 77 };
++uint16_t expected[] = { 0x57B0 /* 123.0. */, 0xE06E /* -567.0. */,
++ 0x0000 /* 0.0. */, 0x6400 /* 1024. */,
++ 0xD3E0 /* -63. */, 0x5948 /* 169. */,
++ 0xC400 /* -4. */, 0x54D0 /* 77. */ };
++
++#define TEST_MSG "VCVTH_F16_S64"
++#define INSN_NAME vcvth_f16_s64
++
++#define EXPECTED expected
++
++#define INPUT input
++#define INPUT_TYPE int64_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u16_1.c
+@@ -0,0 +1,25 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint16_t input[] = { 123, 567, 0, 1024, 63, 169, 4, 77 };
++uint16_t expected[] = { 0x57B0 /* 123.0. */, 0x606E /* 567.0. */,
++ 0x0000 /* 0.0. */, 0x6400 /* 1024.0. */,
++ 0x53E0 /* 63.0. */, 0x5948 /* 169.0. */,
++ 0x4400 /* 4.0. */, 0x54D0 /* 77.0. */ };
++
++#define TEST_MSG "VCVTH_F16_U16"
++#define INSN_NAME vcvth_f16_u16
++
++#define EXPECTED expected
++
++#define INPUT input
++#define INPUT_TYPE uint16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c
+@@ -0,0 +1,52 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++int32_t input[] =
++{
++ 0, -0,
++ 123, -567,
++ -34, 1024,
++ -63, 169,
++ -4, 77,
++ -144, -56,
++ -16, -15,
++ -14, -13,
++};
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x57b0 /* 123.000000 */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x6400 /* 1024.000000 */,
++ 0x7c00 /* inf */,
++ 0x5948 /* 169.000000 */,
++ 0x7c00 /* inf */,
++ 0x54d0 /* 77.000000 */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */
++};
++
++#define TEST_MSG "VCVTH_F16_U32"
++#define INSN_NAME vcvth_f16_u32
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE int32_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u64_1.c
+@@ -0,0 +1,25 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++uint64_t input[] = { 123, 567, 0, 1024, 63, 169, 4, 77 };
++uint16_t expected[] = { 0x57B0 /* 123.0. */, 0x606E /* 567.0. */,
++ 0x0000 /* 0.0. */, 0x6400 /* 1024.0. */,
++ 0x53E0 /* 63.0. */, 0x5948 /* 169.0. */,
++ 0x4400 /* 4.0. */, 0x54D0 /* 77.0. */ };
++
++#define TEST_MSG "VCVTH_F16_U64"
++#define INSN_NAME vcvth_f16_u64
++
++#define EXPECTED expected
++
++#define INPUT input
++#define INPUT_TYPE uint64_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s16_1.c
+@@ -0,0 +1,46 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++int16_t input[] = { 1, 10, 48, 100, -1, -10, 7, -7 };
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected_1[] = { 0x3800 /* 0.5. */,
++ 0x4500 /* 5. */,
++ 0x4E00 /* 24. */,
++ 0x5240 /* 50. */,
++ 0xB800 /* -0.5. */,
++ 0xC500 /* -5. */,
++ 0x4300 /* 3.5. */,
++ 0xC300 /* -3.5. */ };
++
++uint16_t expected_2[] = { 0x3400 /* 0.25. */,
++ 0x4100 /* 2.5. */,
++ 0x4A00 /* 12. */,
++ 0x4E40 /* 25. */,
++ 0xB400 /* -0.25. */,
++ 0xC100 /* -2.5. */,
++ 0x3F00 /* 1.75. */,
++ 0xBF00 /* -1.75. */ };
++
++#define TEST_MSG "VCVTH_N_F16_S16"
++#define INSN_NAME vcvth_n_f16_s16
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++
++#define INPUT_TYPE int16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c
+@@ -0,0 +1,99 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++uint32_t input[] =
++{
++ 0, -0,
++ 123, -567,
++ -34, 1024,
++ -63, 169,
++ -4, 77,
++ -144, -56,
++ -16, -15,
++ -14, -13,
++};
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected_1[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x53b0 /* 61.500000 */,
++ 0xdc6e /* -283.500000 */,
++ 0xcc40 /* -17.000000 */,
++ 0x6000 /* 512.000000 */,
++ 0xcfe0 /* -31.500000 */,
++ 0x5548 /* 84.500000 */,
++ 0xc000 /* -2.000000 */,
++ 0x50d0 /* 38.500000 */,
++ 0xd480 /* -72.000000 */,
++ 0xcf00 /* -28.000000 */,
++ 0xc800 /* -8.000000 */,
++ 0xc780 /* -7.500000 */,
++ 0xc700 /* -7.000000 */,
++ 0xc680 /* -6.500000 */
++};
++
++uint16_t expected_2[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x4fb0 /* 30.750000 */,
++ 0xd86e /* -141.750000 */,
++ 0xc840 /* -8.500000 */,
++ 0x5c00 /* 256.000000 */,
++ 0xcbe0 /* -15.750000 */,
++ 0x5148 /* 42.250000 */,
++ 0xbc00 /* -1.000000 */,
++ 0x4cd0 /* 19.250000 */,
++ 0xd080 /* -36.000000 */,
++ 0xcb00 /* -14.000000 */,
++ 0xc400 /* -4.000000 */,
++ 0xc380 /* -3.750000 */,
++ 0xc300 /* -3.500000 */,
++ 0xc280 /* -3.250000 */
++};
++
++uint16_t expected_3[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x8002 /* -0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x0004 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x0001 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x8001 /* -0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x8000 /* -0.000000 */
++};
++
++#define TEST_MSG "VCVTH_N_F16_S32"
++#define INSN_NAME vcvth_n_f16_s32
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++#define EXPECTED_3 expected_3
++
++#define INPUT_TYPE int32_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++#define SCALAR_3 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s64_1.c
+@@ -0,0 +1,46 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++int64_t input[] = { 1, 10, 48, 100, -1, -10, 7, -7 };
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected_1[] = { 0x3800 /* 0.5. */,
++ 0x4500 /* 5. */,
++ 0x4E00 /* 24. */,
++ 0x5240 /* 50. */,
++ 0xB800 /* -0.5. */,
++ 0xC500 /* -5. */,
++ 0x4300 /* 3.5. */,
++ 0xC300 /* -3.5. */ };
++
++uint16_t expected_2[] = { 0x3400 /* 0.25. */,
++ 0x4100 /* 2.5. */,
++ 0x4A00 /* 12. */,
++ 0x4E40 /* 25. */,
++ 0xB400 /* -0.25. */,
++ 0xC100 /* -2.5. */,
++ 0x3F00 /* 1.75. */,
++ 0xBF00 /* -1.75. */ };
++
++#define TEST_MSG "VCVTH_N_F16_S64"
++#define INSN_NAME vcvth_n_f16_s64
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++
++#define INPUT_TYPE int64_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u16_1.c
+@@ -0,0 +1,46 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++uint16_t input[] = { 1, 10, 48, 100, 1000, 0, 500, 9 };
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected_1[] = { 0x3800 /* 0.5. */,
++ 0x4500 /* 5. */,
++ 0x4E00 /* 24. */,
++ 0x5240 /* 50. */,
++ 0x5FD0 /* 500. */,
++ 0x0000 /* 0.0. */,
++ 0x5BD0 /* 250. */,
++ 0x4480 /* 4.5. */ };
++
++uint16_t expected_2[] = { 0x3400 /* 0.25. */,
++ 0x4100 /* 2.5. */,
++ 0x4A00 /* 12. */,
++ 0x4E40 /* 25. */,
++ 0x5BD0 /* 250. */,
++ 0x0000 /* 0.0. */,
++ 0x57D0 /* 125. */,
++ 0x4080 /* 2.25. */ };
++
++#define TEST_MSG "VCVTH_N_F16_U16"
++#define INSN_NAME vcvth_n_f16_u16
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++
++#define INPUT_TYPE uint16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c
+@@ -0,0 +1,99 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++uint32_t input[] =
++{
++ 0, -0,
++ 123, -567,
++ -34, 1024,
++ -63, 169,
++ -4, 77,
++ -144, -56,
++ -16, -15,
++ -14, -13,
++};
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected_1[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x53b0 /* 61.500000 */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x6000 /* 512.000000 */,
++ 0x7c00 /* inf */,
++ 0x5548 /* 84.500000 */,
++ 0x7c00 /* inf */,
++ 0x50d0 /* 38.500000 */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */
++};
++
++uint16_t expected_2[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x4fb0 /* 30.750000 */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x5c00 /* 256.000000 */,
++ 0x7c00 /* inf */,
++ 0x5148 /* 42.250000 */,
++ 0x7c00 /* inf */,
++ 0x4cd0 /* 19.250000 */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */
++};
++
++uint16_t expected_3[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x0004 /* 0.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x0001 /* 0.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */
++};
++
++#define TEST_MSG "VCVTH_N_F16_U32"
++#define INSN_NAME vcvth_n_f16_u32
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++#define EXPECTED_3 expected_3
++
++#define INPUT_TYPE uint32_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++#define SCALAR_3 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u64_1.c
+@@ -0,0 +1,46 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++uint64_t input[] = { 1, 10, 48, 100, 1000, 0, 500, 9 };
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected_1[] = { 0x3800 /* 0.5. */,
++ 0x4500 /* 5. */,
++ 0x4E00 /* 24. */,
++ 0x5240 /* 50. */,
++ 0x5FD0 /* 500. */,
++ 0x0000 /* 0.0. */,
++ 0x5BD0 /* 250. */,
++ 0x4480 /* 4.5. */ };
++
++uint16_t expected_2[] = { 0x3400 /* 0.25. */,
++ 0x4100 /* 2.5. */,
++ 0x4A00 /* 12. */,
++ 0x4E40 /* 25. */,
++ 0x5BD0 /* 250. */,
++ 0x0000 /* 0.0. */,
++ 0x57D0 /* 125. */,
++ 0x4080 /* 2.25. */ };
++
++#define TEST_MSG "VCVTH_N_F16_U64"
++#define INSN_NAME vcvth_n_f16_u64
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++
++#define INPUT_TYPE uint64_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s16_f16_1.c
+@@ -0,0 +1,29 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 2.5, 100, 7.1, -9.9, -5.0, 9.1, -4.8, 77 };
++int16_t expected_1[] = { 5, 200, 14, -19, -10, 18, -9, 154 };
++int16_t expected_2[] = { 10, 400, 28, -39, -20, 36, -19, 308 };
++
++#define TEST_MSG "VCVTH_N_S16_F16"
++#define INSN_NAME vcvth_n_s16_f16
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int16_t
++#define OUTPUT_TYPE_SIZE 16
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c
+@@ -0,0 +1,100 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
++{
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
++
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected_1[] =
++{
++ 0x00000000,
++ 0x00000000,
++ 0x000000f6,
++ 0xfffffb90,
++ 0xffffffbb,
++ 0x00000800,
++ 0x0000052e,
++ 0x00000152,
++ 0xfffffff7,
++ 0x0000009a,
++ 0xfffffedf,
++ 0xffffff8f,
++ 0xffffffe0,
++ 0xffffffe2,
++ 0xffffffe4,
++ 0xffffffe6,
++};
++
++uint32_t expected_2[] =
++{
++ 0x00000000,
++ 0x00000000,
++ 0x000001ed,
++ 0xfffff720,
++ 0xffffff75,
++ 0x00001000,
++ 0x00000a5c,
++ 0x000002a4,
++ 0xffffffed,
++ 0x00000134,
++ 0xfffffdbe,
++ 0xffffff1d,
++ 0xffffffc0,
++ 0xffffffc4,
++ 0xffffffc8,
++ 0xffffffcc,
++};
++
++uint32_t expected_3[] =
++{
++ 0x00000000,
++ 0x00000000,
++ 0x7fffffff,
++ 0x80000000,
++ 0x80000000,
++ 0x7fffffff,
++ 0x7fffffff,
++ 0x7fffffff,
++ 0x80000000,
++ 0x7fffffff,
++ 0x80000000,
++ 0x80000000,
++ 0x80000000,
++ 0x80000000,
++ 0x80000000,
++ 0x80000000,
++};
++
++#define TEST_MSG "VCVTH_N_S32_F16"
++#define INSN_NAME vcvth_n_s32_f16
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++#define EXPECTED_3 expected_3
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint32_t
++#define OUTPUT_TYPE_SIZE 32
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++#define SCALAR_3 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s64_f16_1.c
+@@ -0,0 +1,29 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 2.5, 100, 7.1, -9.9, -5.0, 9.1, -4.8, 77 };
++int64_t expected_1[] = { 5, 200, 14, -19, -10, 18, -9, 154 };
++int64_t expected_2[] = { 10, 400, 28, -39, -20, 36, -19, 308 };
++
++#define TEST_MSG "VCVTH_N_S64_F16"
++#define INSN_NAME vcvth_n_s64_f16
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int64_t
++#define OUTPUT_TYPE_SIZE 64
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u16_f16_1.c
+@@ -0,0 +1,29 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 2.5, 100, 7.1, 9.9, 5.0, 9.1, 4.8, 77 };
++uint16_t expected_1[] = {5, 200, 14, 19, 10, 18, 9, 154};
++uint16_t expected_2[] = {10, 400, 28, 39, 20, 36, 19, 308};
++
++#define TEST_MSG "VCVTH_N_U16_F16"
++#define INSN_NAME vcvth_n_u16_f16
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c
+@@ -0,0 +1,100 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
++{
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
++
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected_1[] =
++{
++ 0x00000000,
++ 0x00000000,
++ 0x000000f6,
++ 0x00000000,
++ 0x00000000,
++ 0x00000800,
++ 0x0000052e,
++ 0x00000152,
++ 0x00000000,
++ 0x0000009a,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++};
++
++uint32_t expected_2[] =
++{
++ 0x00000000,
++ 0x00000000,
++ 0x000001ed,
++ 0x00000000,
++ 0x00000000,
++ 0x00001000,
++ 0x00000a5c,
++ 0x000002a4,
++ 0x00000000,
++ 0x00000134,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++};
++
++uint32_t expected_3[] =
++{
++ 0x00000000,
++ 0x00000000,
++ 0xffffffff,
++ 0x00000000,
++ 0x00000000,
++ 0xffffffff,
++ 0xffffffff,
++ 0xffffffff,
++ 0x00000000,
++ 0xffffffff,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++};
++
++#define TEST_MSG "VCVTH_N_U32_F16"
++#define INSN_NAME vcvth_n_u32_f16
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++#define EXPECTED_3 expected_3
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint32_t
++#define OUTPUT_TYPE_SIZE 32
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++#define SCALAR_3 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u64_f16_1.c
+@@ -0,0 +1,29 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 2.5, 100, 7.1, 9.9, 5.0, 9.1, 4.8, 77 };
++uint64_t expected_1[] = { 5, 200, 14, 19, 10, 18, 9, 154 };
++uint64_t expected_2[] = { 10, 400, 28, 39, 20, 36, 19, 308 };
++
++#define TEST_MSG "VCVTH_N_U64_F16"
++#define INSN_NAME vcvth_n_u64_f16
++
++#define INPUT input
++#define EXPECTED_1 expected_1
++#define EXPECTED_2 expected_2
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint64_t
++#define OUTPUT_TYPE_SIZE 64
++
++#define SCALAR_OPERANDS
++#define SCALAR_1 1
++#define SCALAR_2 2
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s16_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
++int16_t expected[] = { 123, -56, 0, 24, -63, 169, -4, 77 };
++
++#define TEST_MSG "VCVTH_S16_F16"
++#define INSN_NAME vcvth_s16_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c
+@@ -0,0 +1,53 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
++{
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
++
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected[] =
++{
++ 0x00000000,
++ 0x00000000,
++ 0x0000007b,
++ 0xfffffdc8,
++ 0xffffffde,
++ 0x00000400,
++ 0x00000297,
++ 0x000000a9,
++ 0xfffffffc,
++ 0x0000004d,
++ 0xffffff70,
++ 0xffffffc8,
++ 0xfffffff0,
++ 0xfffffff1,
++ 0xfffffff2,
++ 0xfffffff3,
++};
++
++#define TEST_MSG "VCVTH_S32_F16"
++#define INSN_NAME vcvth_s32_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int32_t
++#define OUTPUT_TYPE_SIZE 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_s64_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
++int64_t expected[] = { 123, -56, 0, 24, -63, 169, -4, 77 };
++
++#define TEST_MSG "VCVTH_S64_F16"
++#define INSN_NAME vcvth_s64_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int64_t
++#define OUTPUT_TYPE_SIZE 64
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u16_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-+vabds_f32 (float32_t __a, float32_t __b)
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
++uint16_t expected[] = { 123, 56, 0, 24, 63, 169, 4, 77 };
++
++#define TEST_MSG "VCVTH_u16_F16"
++#define INSN_NAME vcvth_u16_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c
+@@ -0,0 +1,53 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
+{
-+ return __builtin_aarch64_fabdsf (__a, __b);
-+}
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
+
-+__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-+vabdd_f64 (float64_t __a, float64_t __b)
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected[] =
+{
-+ return __builtin_aarch64_fabddf (__a, __b);
-+}
++ 0x00000000,
++ 0x00000000,
++ 0x0000007b,
++ 0x00000000,
++ 0x00000000,
++ 0x00000400,
++ 0x00000297,
++ 0x000000a9,
++ 0x00000000,
++ 0x0000004d,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++};
+
-+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-+vabd_f32 (float32x2_t __a, float32x2_t __b)
++#define TEST_MSG "VCVTH_U32_F16"
++#define INSN_NAME vcvth_u32_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint32_t
++#define OUTPUT_TYPE_SIZE 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvth_u64_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
++uint64_t expected[] = { 123, 56, 0, 24, 63, 169, 4, 77 };
++
++#define TEST_MSG "VCVTH_u64_F16"
++#define INSN_NAME vcvth_u64_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint64_t
++#define OUTPUT_TYPE_SIZE 64
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtm_1.c
+@@ -0,0 +1,33 @@
++/* This file tests an intrinsic which currently has only an f16 variant and that
++ is only available when FP16 arithmetic instructions are supported. */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++#include <math.h>
++
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, int, 16, 4) [] = { 0xfff0, 0x5, 0xfff0, 0x5 };
++VECT_VAR_DECL(expected, uint, 16, 4) [] = { 0x0, 0x5, 0x0, 0x5 };
++VECT_VAR_DECL(expected, int, 16, 8) [] = { 0x0, 0x0, 0xf, 0xfff0, 0x0,
++ 0x0, 0xf, 0xfff0 };
++VECT_VAR_DECL(expected, uint, 16, 8) [] = { 0x0, 0x0, 0xf, 0x0,
++ 0x0, 0x0, 0xf, 0x0 };
++#endif
++
++/* Expected results with rounding. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_rounding, int, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
++VECT_VAR_DECL(expected_rounding, uint, 16, 4) [] = { 0xa, 0xa, 0xa, 0xa };
++VECT_VAR_DECL(expected_rounding, int, 16, 8) [] = { 0x7d, 0x7d, 0x7d, 0x7d,
++ 0x7d, 0x7d, 0x7d, 0x7d };
++VECT_VAR_DECL(expected_rounding, uint, 16, 8) [] = { 0x7d, 0x7d, 0x7d, 0x7d,
++ 0x7d, 0x7d, 0x7d, 0x7d };
++#endif
++
++#define TEST_MSG "VCVTM/VCVTMQ"
++#define INSN_NAME vcvtm
++
++#include "vcvtX.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s16_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
++int16_t expected[] = { 123, -57, 0, 24, -64, 169, -5, 77 };
++
++#define TEST_MSG "VCVTMH_S16_F16"
++#define INSN_NAME vcvtmh_s16_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c
+@@ -0,0 +1,53 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
+{
-+ return __builtin_aarch64_fabdv2sf (__a, __b);
-+}
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
+
-+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
-+vabd_f64 (float64x1_t __a, float64x1_t __b)
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected[] =
+{
-+ return (float64x1_t) {vabdd_f64 (vget_lane_f64 (__a, 0),
-+ vget_lane_f64 (__b, 0))};
-+}
++ 0x00000000,
++ 0x00000000,
++ 0x0000007b,
++ 0xfffffdc8,
++ 0xffffffdd,
++ 0x00000400,
++ 0x00000297,
++ 0x000000a9,
++ 0xfffffffb,
++ 0x0000004d,
++ 0xffffff6f,
++ 0xffffffc7,
++ 0xfffffff0,
++ 0xfffffff1,
++ 0xfffffff2,
++ 0xfffffff3
++};
+
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vabdq_f32 (float32x4_t __a, float32x4_t __b)
++#define TEST_MSG "VCVTMH_S32_F16"
++#define INSN_NAME vcvtmh_s32_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int32_t
++#define OUTPUT_TYPE_SIZE 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s64_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
++int64_t expected[] = { 123, -57, 0, 24, -64, 169, -5, 77 };
++
++#define TEST_MSG "VCVTMH_S64_F16"
++#define INSN_NAME vcvtmh_s64_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int64_t
++#define OUTPUT_TYPE_SIZE 64
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u16_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
++uint16_t expected[] = { 123, 56, 0, 24, 63, 169, 4, 77 };
++
++#define TEST_MSG "VCVTMH_u16_F16"
++#define INSN_NAME vcvtmh_u16_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c
+@@ -0,0 +1,53 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
+{
-+ return __builtin_aarch64_fabdv4sf (__a, __b);
-+}
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
+
-+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-+vabdq_f64 (float64x2_t __a, float64x2_t __b)
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected[] =
+{
-+ return __builtin_aarch64_fabdv2df (__a, __b);
-+}
++ 0x00000000,
++ 0x00000000,
++ 0x0000007b,
++ 0x00000000,
++ 0x00000000,
++ 0x00000400,
++ 0x00000297,
++ 0x000000a9,
++ 0x00000000,
++ 0x0000004d,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++};
+
- /* vabs */
-
- __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-@@ -13026,84 +12329,208 @@ vcnt_p8 (poly8x8_t __a)
- __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
- vcnt_s8 (int8x8_t __a)
- {
-- return __builtin_aarch64_popcountv8qi (__a);
-+ return __builtin_aarch64_popcountv8qi (__a);
-+}
++#define TEST_MSG "VCVTMH_U32_F16"
++#define INSN_NAME vcvtmh_u32_f16
+
-+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-+vcnt_u8 (uint8x8_t __a)
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint32_t
++#define OUTPUT_TYPE_SIZE 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u64_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
++uint64_t expected[] = { 123, 56, 0, 24, 63, 169, 4, 77 };
++
++#define TEST_MSG "VCVTMH_u64_F16"
++#define INSN_NAME vcvtmh_u64_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint64_t
++#define OUTPUT_TYPE_SIZE 64
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s16_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
++int16_t expected[] = { 124, -57, 1, 25, -64, 169, -4, 77 };
++
++#define TEST_MSG "VCVTNH_S16_F16"
++#define INSN_NAME vcvtnh_s16_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c
+@@ -0,0 +1,53 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
+{
-+ return (uint8x8_t) __builtin_aarch64_popcountv8qi ((int8x8_t) __a);
-+}
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
+
-+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-+vcntq_p8 (poly8x16_t __a)
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected[] =
+{
-+ return (poly8x16_t) __builtin_aarch64_popcountv16qi ((int8x16_t) __a);
-+}
++ 0x00000000,
++ 0x00000000,
++ 0x0000007b,
++ 0xfffffdc8,
++ 0xffffffdd,
++ 0x00000400,
++ 0x00000297,
++ 0x000000a9,
++ 0xfffffffb,
++ 0x0000004d,
++ 0xffffff70,
++ 0xffffffc7,
++ 0xfffffff0,
++ 0xfffffff1,
++ 0xfffffff2,
++ 0xfffffff3
++};
+
-+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-+vcntq_s8 (int8x16_t __a)
++#define TEST_MSG "VCVTNH_S32_F16"
++#define INSN_NAME vcvtnh_s32_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int32_t
++#define OUTPUT_TYPE_SIZE 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s64_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
++int64_t expected[] = { 124, -57, 1, 25, -64, 169, -4, 77 };
++
++#define TEST_MSG "VCVTNH_S64_F16"
++#define INSN_NAME vcvtnh_s64_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int64_t
++#define OUTPUT_TYPE_SIZE 64
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u16_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
++uint16_t expected[] = { 124, 57, 1, 25, 64, 169, 4, 77 };
++
++#define TEST_MSG "VCVTNH_u16_F16"
++#define INSN_NAME vcvtnh_u16_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c
+@@ -0,0 +1,53 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
+{
-+ return __builtin_aarch64_popcountv16qi (__a);
-+}
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
+
-+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-+vcntq_u8 (uint8x16_t __a)
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected[] =
+{
-+ return (uint8x16_t) __builtin_aarch64_popcountv16qi ((int8x16_t) __a);
-+}
++ 0x00000000,
++ 0x00000000,
++ 0x0000007b,
++ 0x00000000,
++ 0x00000000,
++ 0x00000400,
++ 0x00000297,
++ 0x000000a9,
++ 0x00000000,
++ 0x0000004d,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++};
+
-+/* vcvt (double -> float). */
++#define TEST_MSG "VCVTNH_U32_F16"
++#define INSN_NAME vcvtnh_u32_f16
+
-+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
-+vcvt_f16_f32 (float32x4_t __a)
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint32_t
++#define OUTPUT_TYPE_SIZE 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u64_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
++uint64_t expected[] = { 124, 57, 1, 25, 64, 169, 4, 77 };
++
++#define TEST_MSG "VCVTNH_u64_F16"
++#define INSN_NAME vcvtnh_u64_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint64_t
++#define OUTPUT_TYPE_SIZE 64
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtp_1.c
+@@ -0,0 +1,33 @@
++/* This file tests an intrinsic which currently has only an f16 variant and that
++ is only available when FP16 arithmetic instructions are supported. */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++#include <math.h>
++
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, int, 16, 4) [] = { 0xfff1, 0x6, 0xfff1, 0x6 };
++VECT_VAR_DECL(expected, uint, 16, 4) [] = { 0x0, 0x6, 0x0, 0x6 };
++VECT_VAR_DECL(expected, int, 16, 8) [] = { 0x0, 0x0, 0x10, 0xfff1,
++ 0x0, 0x0, 0x10, 0xfff1 };
++VECT_VAR_DECL(expected, uint, 16, 8) [] = { 0x0, 0x0, 0x10, 0x0,
++ 0x0, 0x0, 0x10, 0x0 };
++#endif
++
++/* Expected results with rounding. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_rounding, int, 16, 4) [] = { 0xb, 0xb, 0xb, 0xb };
++VECT_VAR_DECL(expected_rounding, uint, 16, 4) [] = { 0xb, 0xb, 0xb, 0xb };
++VECT_VAR_DECL(expected_rounding, int, 16, 8) [] = { 0x7e, 0x7e, 0x7e, 0x7e,
++ 0x7e, 0x7e, 0x7e, 0x7e };
++VECT_VAR_DECL(expected_rounding, uint, 16, 8) [] = { 0x7e, 0x7e, 0x7e, 0x7e,
++ 0x7e, 0x7e, 0x7e, 0x7e };
++#endif
++
++#define TEST_MSG "VCVTP/VCVTPQ"
++#define INSN_NAME vcvtp
++
++#include "vcvtX.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s16_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
++int16_t expected[] = { 124, -56, 1, 25, -63, 170, -4, 77 };
++
++#define TEST_MSG "VCVTPH_S16_F16"
++#define INSN_NAME vcvtph_s16_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c
+@@ -0,0 +1,53 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
+{
-+ return __builtin_aarch64_float_truncate_lo_v4hf (__a);
-+}
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
+
-+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
-+vcvt_high_f16_f32 (float16x4_t __a, float32x4_t __b)
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected[] =
+{
-+ return __builtin_aarch64_float_truncate_hi_v8hf (__a, __b);
-+}
++ 0x00000000,
++ 0x00000000,
++ 0x0000007c,
++ 0xfffffdc8,
++ 0xffffffde,
++ 0x00000400,
++ 0x00000297,
++ 0x000000aa,
++ 0xfffffffc,
++ 0x0000004d,
++ 0xffffff70,
++ 0xffffffc8,
++ 0xfffffff0,
++ 0xfffffff1,
++ 0xfffffff2,
++ 0xfffffff3
++};
+
-+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-+vcvt_f32_f64 (float64x2_t __a)
++#define TEST_MSG "VCVTPH_S32_F16"
++#define INSN_NAME vcvtph_s32_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int32_t
++#define OUTPUT_TYPE_SIZE 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_s64_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, -56.8, 0.7, 24.6, -63.5, 169.4, -4.3, 77.0 };
++int64_t expected[] = { 124, -56, 1, 25, -63, 170, -4, 77 };
++
++#define TEST_MSG "VCVTPH_S64_F16"
++#define INSN_NAME vcvtph_s64_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE int64_t
++#define OUTPUT_TYPE_SIZE 64
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u16_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
++uint16_t expected[] = { 124, 57, 1, 25, 64, 170, 5, 77 };
++
++#define TEST_MSG "VCVTPH_u16_F16"
++#define INSN_NAME vcvtph_u16_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c
+@@ -0,0 +1,53 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] =
+{
-+ return __builtin_aarch64_float_truncate_lo_v2sf (__a);
-+}
++ 0.0, -0.0,
++ 123.4, -567.8,
++ -34.8, 1024,
++ 663.1, 169.1,
++ -4.8, 77.0,
++ -144.5, -56.8,
++
++ (float16_t) -16, (float16_t) -15,
++ (float16_t) -14, (float16_t) -13,
++};
+
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vcvt_high_f32_f64 (float32x2_t __a, float64x2_t __b)
++/* Expected results (32-bit hexadecimal representation). */
++uint32_t expected[] =
+{
-+ return __builtin_aarch64_float_truncate_hi_v4sf (__a, __b);
-+}
++ 0x00000000,
++ 0x00000000,
++ 0x0000007c,
++ 0x00000000,
++ 0x00000000,
++ 0x00000400,
++ 0x00000297,
++ 0x000000aa,
++ 0x00000000,
++ 0x0000004d,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++ 0x00000000,
++};
+
-+/* vcvt (float -> double). */
++#define TEST_MSG "VCVTPH_U32_F16"
++#define INSN_NAME vcvtph_u32_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint32_t
++#define OUTPUT_TYPE_SIZE 32
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvtph_u64_f16_1.c
+@@ -0,0 +1,23 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.9, 56.8, 0.7, 24.6, 63.5, 169.4, 4.3, 77.0 };
++uint64_t expected[] = { 124, 57, 1, 25, 64, 170, 5, 77 };
++
++#define TEST_MSG "VCVTPH_u64_F16"
++#define INSN_NAME vcvtph_u64_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE uint64_t
++#define OUTPUT_TYPE_SIZE 64
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdiv_f16_1.c
+@@ -0,0 +1,86 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vcvt_f32_f16 (float16x4_t __a)
++#define FP16_C(a) ((__fp16) a)
++#define A FP16_C (13.4)
++#define B FP16_C (-56.8)
++#define C FP16_C (-34.8)
++#define D FP16_C (12)
++#define E FP16_C (63.1)
++#define F FP16_C (19.1)
++#define G FP16_C (-4.8)
++#define H FP16_C (77)
++
++#define I FP16_C (0.7)
++#define J FP16_C (-78)
++#define K FP16_C (11.23)
++#define L FP16_C (98)
++#define M FP16_C (87.1)
++#define N FP16_C (-8)
++#define O FP16_C (-1.1)
++#define P FP16_C (-9.7)
++
++/* Expected results for vdiv. */
++VECT_VAR_DECL (expected_div_static, hfloat, 16, 4) []
++ = { 0x32CC /* A / E. */, 0xC1F3 /* B / F. */,
++ 0x4740 /* C / G. */, 0x30FD /* D / H. */ };
++
++VECT_VAR_DECL (expected_div_static, hfloat, 16, 8) []
++ = { 0x32CC /* A / E. */, 0xC1F3 /* B / F. */,
++ 0x4740 /* C / G. */, 0x30FD /* D / H. */,
++ 0x201D /* I / M. */, 0x48E0 /* J / N. */,
++ 0xC91B /* K / O. */, 0xC90D /* L / P. */ };
++
++void exec_vdiv_f16 (void)
+{
-+ return __builtin_aarch64_float_extend_lo_v4sf (__a);
-+}
++#undef TEST_MSG
++#define TEST_MSG "VDIV (FP16)"
++ clean_results ();
+
-+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-+vcvt_f64_f32 (float32x2_t __a)
-+{
++ DECL_VARIABLE(vsrc_1, float, 16, 4);
++ DECL_VARIABLE(vsrc_2, float, 16, 4);
++ VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
++ VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
++ VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
++ VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
+
-+ return __builtin_aarch64_float_extend_lo_v2df (__a);
-+}
++ DECL_VARIABLE (vector_res, float, 16, 4)
++ = vdiv_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4));
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
+
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vcvt_high_f32_f16 (float16x8_t __a)
-+{
-+ return __builtin_aarch64_vec_unpacks_hi_v8hf (__a);
-+}
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_div_static, "");
+
-+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-+vcvt_high_f64_f32 (float32x4_t __a)
-+{
-+ return __builtin_aarch64_vec_unpacks_hi_v4sf (__a);
-+}
++#undef TEST_MSG
++#define TEST_MSG "VDIVQ (FP16)"
++ clean_results ();
+
-+/* vcvt (<u>fixed-point -> float). */
++ DECL_VARIABLE(vsrc_1, float, 16, 8);
++ DECL_VARIABLE(vsrc_2, float, 16, 8);
++ VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
++ VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
++ VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
++ VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
+
-+__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-+vcvtd_n_f64_s64 (int64_t __a, const int __b)
-+{
-+ return __builtin_aarch64_scvtfdi (__a, __b);
-+}
++ DECL_VARIABLE (vector_res, float, 16, 8)
++ = vdivq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8));
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
-+__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-+vcvtd_n_f64_u64 (uint64_t __a, const int __b)
-+{
-+ return __builtin_aarch64_ucvtfdi_sus (__a, __b);
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_div_static, "");
+}
+
-+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-+vcvts_n_f32_s32 (int32_t __a, const int __b)
++int
++main (void)
+{
-+ return __builtin_aarch64_scvtfsi (__a, __b);
++ exec_vdiv_f16 ();
++ return 0;
+}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c
+@@ -0,0 +1,42 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
-+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-+vcvts_n_f32_u32 (uint32_t __a, const int __b)
-+{
-+ return __builtin_aarch64_ucvtfsi_sus (__a, __b);
-+}
++#include <arm_fp16.h>
+
-+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-+vcvt_n_f32_s32 (int32x2_t __a, const int __b)
-+{
-+ return __builtin_aarch64_scvtfv2si (__a, __b);
-+}
++#define INFF __builtin_inf ()
+
-+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-+vcvt_n_f32_u32 (uint32x2_t __a, const int __b)
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
+{
-+ return __builtin_aarch64_ucvtfv2si_sus (__a, __b);
- }
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0xb765 /* -0.462158 */,
++ 0x27ef /* 0.030991 */,
++ 0x3955 /* 0.666504 */,
++ 0xccff /* -19.984375 */,
++ 0xc49a /* -4.601562 */,
++ 0xb1e3 /* -0.183960 */,
++ 0x3cd3 /* 1.206055 */,
++ 0x23f0 /* 0.015503 */,
++ 0xa9ef /* -0.046356 */,
++ 0x32f4 /* 0.217285 */,
++ 0xb036 /* -0.131592 */,
++ 0x4126 /* 2.574219 */,
++ 0xcd15 /* -20.328125 */,
++ 0x537f /* 59.968750 */,
++ 0x7e00 /* nan */,
++ 0x7e00 /* nan */
++};
++
++#define TEST_MSG "VDIVH_F16"
++#define INSN_NAME vdivh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c
+@@ -19,6 +19,10 @@ VECT_VAR_DECL(expected0,uint,64,1) [] = { 0xfffffffffffffff0 };
+ VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0 };
+ VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected0, hfloat, 16, 4) [] = { 0xcc00, 0xcc00,
++ 0xcc00, 0xcc00 };
++#endif
+ VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc1800000, 0xc1800000 };
+ VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0,
+@@ -46,6 +50,12 @@ VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf0, 0xf0, 0xf0,
+ 0xf0, 0xf0, 0xf0, 0xf0 };
+ VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff0, 0xfff0, 0xfff0,
+ 0xfff0, 0xfff0, 0xfff0, 0xfff0 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected0, hfloat, 16, 8) [] = { 0xcc00, 0xcc00,
++ 0xcc00, 0xcc00,
++ 0xcc00, 0xcc00,
++ 0xcc00, 0xcc00 };
++#endif
+ VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc1800000, 0xc1800000,
+ 0xc1800000, 0xc1800000 };
--__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
--vcnt_u8 (uint8x8_t __a)
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vcvtq_n_f32_s32 (int32x4_t __a, const int __b)
- {
-- return (uint8x8_t) __builtin_aarch64_popcountv8qi ((int8x8_t) __a);
-+ return __builtin_aarch64_scvtfv4si (__a, __b);
- }
+@@ -63,6 +73,10 @@ VECT_VAR_DECL(expected1,uint,64,1) [] = { 0xfffffffffffffff1 };
+ VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+ 0xf1, 0xf1, 0xf1, 0xf1 };
+ VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected1, hfloat, 16, 4) [] = { 0xcb80, 0xcb80,
++ 0xcb80, 0xcb80 };
++#endif
+ VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0xc1700000, 0xc1700000 };
+ VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+ 0xf1, 0xf1, 0xf1, 0xf1,
+@@ -90,6 +104,12 @@ VECT_VAR_DECL(expected1,poly,8,16) [] = { 0xf1, 0xf1, 0xf1, 0xf1,
+ 0xf1, 0xf1, 0xf1, 0xf1 };
+ VECT_VAR_DECL(expected1,poly,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
+ 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected1, hfloat, 16, 8) [] = { 0xcb80, 0xcb80,
++ 0xcb80, 0xcb80,
++ 0xcb80, 0xcb80,
++ 0xcb80, 0xcb80 };
++#endif
+ VECT_VAR_DECL(expected1,hfloat,32,4) [] = { 0xc1700000, 0xc1700000,
+ 0xc1700000, 0xc1700000 };
--__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
--vcntq_p8 (poly8x16_t __a)
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vcvtq_n_f32_u32 (uint32x4_t __a, const int __b)
- {
-- return (poly8x16_t) __builtin_aarch64_popcountv16qi ((int8x16_t) __a);
-+ return __builtin_aarch64_ucvtfv4si_sus (__a, __b);
- }
+@@ -107,6 +127,10 @@ VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff2 };
+ VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
+ 0xf2, 0xf2, 0xf2, 0xf2 };
+ VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff2 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected2, hfloat, 16, 4) [] = { 0xcb00, 0xcb00,
++ 0xcb00, 0xcb00 };
++#endif
+ VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1600000, 0xc1600000 };
+ VECT_VAR_DECL(expected2,int,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
+ 0xf2, 0xf2, 0xf2, 0xf2,
+@@ -134,6 +158,12 @@ VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
+ 0xf2, 0xf2, 0xf2, 0xf2 };
+ VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff2,
+ 0xfff2, 0xfff2, 0xfff2, 0xfff2 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected2, hfloat, 16, 8) [] = { 0xcb00, 0xcb00,
++ 0xcb00, 0xcb00,
++ 0xcb00, 0xcb00,
++ 0xcb00, 0xcb00 };
++#endif
+ VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1600000, 0xc1600000,
+ 0xc1600000, 0xc1600000 };
--__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
--vcntq_s8 (int8x16_t __a)
-+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-+vcvtq_n_f64_s64 (int64x2_t __a, const int __b)
- {
-- return __builtin_aarch64_popcountv16qi (__a);
-+ return __builtin_aarch64_scvtfv2di (__a, __b);
- }
+@@ -171,6 +201,9 @@ void exec_vdup_vmov (void)
+ TEST_VDUP(, uint, u, 64, 1);
+ TEST_VDUP(, poly, p, 8, 8);
+ TEST_VDUP(, poly, p, 16, 4);
++#if defined (FP16_SUPPORTED)
++ TEST_VDUP(, float, f, 16, 4);
++#endif
+ TEST_VDUP(, float, f, 32, 2);
--__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
--vcntq_u8 (uint8x16_t __a)
-+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-+vcvtq_n_f64_u64 (uint64x2_t __a, const int __b)
- {
-- return (uint8x16_t) __builtin_aarch64_popcountv16qi ((int8x16_t) __a);
-+ return __builtin_aarch64_ucvtfv2di_sus (__a, __b);
- }
+ TEST_VDUP(q, int, s, 8, 16);
+@@ -183,8 +216,26 @@ void exec_vdup_vmov (void)
+ TEST_VDUP(q, uint, u, 64, 2);
+ TEST_VDUP(q, poly, p, 8, 16);
+ TEST_VDUP(q, poly, p, 16, 8);
++#if defined (FP16_SUPPORTED)
++ TEST_VDUP(q, float, f, 16, 8);
++#endif
+ TEST_VDUP(q, float, f, 32, 4);
--/* vcvt (double -> float). */
-+/* vcvt (float -> <u>fixed-point). */
++#if defined (FP16_SUPPORTED)
++ switch (i) {
++ case 0:
++ CHECK_RESULTS_NAMED (TEST_MSG, expected0, "");
++ break;
++ case 1:
++ CHECK_RESULTS_NAMED (TEST_MSG, expected1, "");
++ break;
++ case 2:
++ CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
++ break;
++ default:
++ abort();
++ }
++#else
+ switch (i) {
+ case 0:
+ CHECK_RESULTS_NAMED_NO_FP16 (TEST_MSG, expected0, "");
+@@ -198,6 +249,7 @@ void exec_vdup_vmov (void)
+ default:
+ abort();
+ }
++#endif
+ }
--__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
--vcvt_f16_f32 (float32x4_t __a)
-+__extension__ static __inline int64_t __attribute__ ((__always_inline__))
-+vcvtd_n_s64_f64 (float64_t __a, const int __b)
- {
-- return __builtin_aarch64_float_truncate_lo_v4hf (__a);
-+ return __builtin_aarch64_fcvtzsdf (__a, __b);
- }
+ /* Do the same tests with vmov. Use the same expected results. */
+@@ -216,6 +268,9 @@ void exec_vdup_vmov (void)
+ TEST_VMOV(, uint, u, 64, 1);
+ TEST_VMOV(, poly, p, 8, 8);
+ TEST_VMOV(, poly, p, 16, 4);
++#if defined (FP16_SUPPORTED)
++ TEST_VMOV(, float, f, 16, 4);
++#endif
+ TEST_VMOV(, float, f, 32, 2);
--__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
--vcvt_high_f16_f32 (float16x4_t __a, float32x4_t __b)
-+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
-+vcvtd_n_u64_f64 (float64_t __a, const int __b)
- {
-- return __builtin_aarch64_float_truncate_hi_v8hf (__a, __b);
-+ return __builtin_aarch64_fcvtzudf_uss (__a, __b);
- }
+ TEST_VMOV(q, int, s, 8, 16);
+@@ -228,8 +283,26 @@ void exec_vdup_vmov (void)
+ TEST_VMOV(q, uint, u, 64, 2);
+ TEST_VMOV(q, poly, p, 8, 16);
+ TEST_VMOV(q, poly, p, 16, 8);
++#if defined (FP16_SUPPORTED)
++ TEST_VMOV(q, float, f, 16, 8);
++#endif
+ TEST_VMOV(q, float, f, 32, 4);
--__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
--vcvt_f32_f64 (float64x2_t __a)
-+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
-+vcvts_n_s32_f32 (float32_t __a, const int __b)
- {
-- return __builtin_aarch64_float_truncate_lo_v2sf (__a);
-+ return __builtin_aarch64_fcvtzssf (__a, __b);
++#if defined (FP16_SUPPORTED)
++ switch (i) {
++ case 0:
++ CHECK_RESULTS_NAMED (TEST_MSG, expected0, "");
++ break;
++ case 1:
++ CHECK_RESULTS_NAMED (TEST_MSG, expected1, "");
++ break;
++ case 2:
++ CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
++ break;
++ default:
++ abort();
++ }
++#else
+ switch (i) {
+ case 0:
+ CHECK_RESULTS_NAMED_NO_FP16 (TEST_MSG, expected0, "");
+@@ -243,6 +316,8 @@ void exec_vdup_vmov (void)
+ default:
+ abort();
+ }
++#endif
++
+ }
}
--__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
--vcvt_high_f32_f64 (float32x2_t __a, float64x2_t __b)
-+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
-+vcvts_n_u32_f32 (float32_t __a, const int __b)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c
+@@ -17,6 +17,10 @@ VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf7, 0xf7, 0xf7, 0xf7,
+ 0xf7, 0xf7, 0xf7, 0xf7 };
+ VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff3, 0xfff3, 0xfff3, 0xfff3 };
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1700000, 0xc1700000 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xca80, 0xca80,
++ 0xca80, 0xca80 };
++#endif
+ VECT_VAR_DECL(expected,int,8,16) [] = { 0xf2, 0xf2, 0xf2, 0xf2,
+ 0xf2, 0xf2, 0xf2, 0xf2,
+ 0xf2, 0xf2, 0xf2, 0xf2,
+@@ -43,10 +47,16 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf5, 0xf5, 0xf5, 0xf5,
+ 0xf5, 0xf5, 0xf5, 0xf5 };
+ VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
+ 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xca80, 0xca80,
++ 0xca80, 0xca80,
++ 0xca80, 0xca80,
++ 0xca80, 0xca80 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1700000, 0xc1700000,
+ 0xc1700000, 0xc1700000 };
+
+-#define TEST_MSG "VDUP_LANE/VDUP_LANEQ"
++#define TEST_MSG "VDUP_LANE/VDUPQ_LANE"
+ void exec_vdup_lane (void)
{
-- return __builtin_aarch64_float_truncate_hi_v4sf (__a, __b);
-+ return __builtin_aarch64_fcvtzusf_uss (__a, __b);
+ /* Basic test: vec1=vdup_lane(vec2, lane), then store the result. */
+@@ -63,6 +73,9 @@ void exec_vdup_lane (void)
+ clean_results ();
+
+ TEST_MACRO_64BITS_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (FP16_SUPPORTED)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++#endif
+ VLOAD(vector, buffer, , float, f, 32, 2);
+
+ /* Choose lane arbitrarily. */
+@@ -76,6 +89,9 @@ void exec_vdup_lane (void)
+ TEST_VDUP_LANE(, uint, u, 64, 1, 1, 0);
+ TEST_VDUP_LANE(, poly, p, 8, 8, 8, 7);
+ TEST_VDUP_LANE(, poly, p, 16, 4, 4, 3);
++#if defined (FP16_SUPPORTED)
++ TEST_VDUP_LANE(, float, f, 16, 4, 4, 3);
++#endif
+ TEST_VDUP_LANE(, float, f, 32, 2, 2, 1);
+
+ TEST_VDUP_LANE(q, int, s, 8, 16, 8, 2);
+@@ -88,9 +104,133 @@ void exec_vdup_lane (void)
+ TEST_VDUP_LANE(q, uint, u, 64, 2, 1, 0);
+ TEST_VDUP_LANE(q, poly, p, 8, 16, 8, 5);
+ TEST_VDUP_LANE(q, poly, p, 16, 8, 4, 1);
++#if defined (FP16_SUPPORTED)
++ TEST_VDUP_LANE(q, float, f, 16, 8, 4, 3);
++#endif
+ TEST_VDUP_LANE(q, float, f, 32, 4, 2, 1);
+
++#if defined (FP16_SUPPORTED)
++ CHECK_RESULTS (TEST_MSG, "");
++#else
+ CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
++#endif
++
++#if defined (__aarch64__)
++
++#undef TEST_MSG
++#define TEST_MSG "VDUP_LANEQ/VDUPQ_LANEQ"
++
++ /* Expected results for vdup*_laneq tests. */
++VECT_VAR_DECL(expected2,int,8,8) [] = { 0xfd, 0xfd, 0xfd, 0xfd,
++ 0xfd, 0xfd, 0xfd, 0xfd };
++VECT_VAR_DECL(expected2,int,16,4) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff2 };
++VECT_VAR_DECL(expected2,int,32,2) [] = { 0xfffffff1, 0xfffffff1 };
++VECT_VAR_DECL(expected2,int,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(expected2,uint,8,8) [] = { 0xff, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(expected2,uint,16,4) [] = { 0xfff3, 0xfff3, 0xfff3, 0xfff3 };
++VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xfffffff1, 0xfffffff1 };
++VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf7, 0xf7, 0xf7, 0xf7,
++ 0xf7, 0xf7, 0xf7, 0xf7 };
++VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff3, 0xfff3, 0xfff3, 0xfff3 };
++VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1700000, 0xc1700000 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected2, hfloat, 16, 4) [] = { 0xca80, 0xca80,
++ 0xca80, 0xca80 };
++#endif
++VECT_VAR_DECL(expected2,int,8,16) [] = { 0xfb, 0xfb, 0xfb, 0xfb,
++ 0xfb, 0xfb, 0xfb, 0xfb,
++ 0xfb, 0xfb, 0xfb, 0xfb,
++ 0xfb, 0xfb, 0xfb, 0xfb };
++VECT_VAR_DECL(expected2,int,16,8) [] = { 0xfff7, 0xfff7, 0xfff7, 0xfff7,
++ 0xfff7, 0xfff7, 0xfff7, 0xfff7 };
++VECT_VAR_DECL(expected2,int,32,4) [] = { 0xfffffff1, 0xfffffff1,
++ 0xfffffff1, 0xfffffff1 };
++VECT_VAR_DECL(expected2,int,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff0 };
++VECT_VAR_DECL(expected2,uint,8,16) [] = { 0xf5, 0xf5, 0xf5, 0xf5,
++ 0xf5, 0xf5, 0xf5, 0xf5,
++ 0xf5, 0xf5, 0xf5, 0xf5,
++ 0xf5, 0xf5, 0xf5, 0xf5 };
++VECT_VAR_DECL(expected2,uint,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
++ 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
++VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xfffffff0, 0xfffffff0,
++ 0xfffffff0, 0xfffffff0 };
++VECT_VAR_DECL(expected2,uint,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff0 };
++VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf5, 0xf5, 0xf5, 0xf5,
++ 0xf5, 0xf5, 0xf5, 0xf5,
++ 0xf5, 0xf5, 0xf5, 0xf5,
++ 0xf5, 0xf5, 0xf5, 0xf5 };
++VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff1, 0xfff1, 0xfff1, 0xfff1,
++ 0xfff1, 0xfff1, 0xfff1, 0xfff1 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected2, hfloat, 16, 8) [] = { 0xc880, 0xc880,
++ 0xc880, 0xc880,
++ 0xc880, 0xc880,
++ 0xc880, 0xc880 };
++#endif
++VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1700000, 0xc1700000,
++ 0xc1700000, 0xc1700000 };
++
++ /* Clean all results for vdup*_laneq tests. */
++ clean_results ();
++ /* Basic test: vec1=vdup_lane(vec2, lane), then store the result. */
++#define TEST_VDUP_LANEQ(Q, T1, T2, W, N, N2, L) \
++ VECT_VAR(vector_res, T1, W, N) = \
++ vdup##Q##_laneq_##T2##W(VECT_VAR(vector, T1, W, N2), L); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
++
++ /* Input vector can only have 64 bits. */
++ DECL_VARIABLE_128BITS_VARIANTS(vector);
++
++ clean_results ();
++
++ TEST_MACRO_128BITS_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (FP16_SUPPORTED)
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
++ VLOAD(vector, buffer, q, float, f, 32, 4);
++
++ /* Choose lane arbitrarily. */
++ TEST_VDUP_LANEQ(, int, s, 8, 8, 16, 13);
++ TEST_VDUP_LANEQ(, int, s, 16, 4, 8, 2);
++ TEST_VDUP_LANEQ(, int, s, 32, 2, 4, 1);
++ TEST_VDUP_LANEQ(, int, s, 64, 1, 2, 0);
++ TEST_VDUP_LANEQ(, uint, u, 8, 8, 16, 15);
++ TEST_VDUP_LANEQ(, uint, u, 16, 4, 8, 3);
++ TEST_VDUP_LANEQ(, uint, u, 32, 2, 4, 1);
++ TEST_VDUP_LANEQ(, uint, u, 64, 1, 2, 0);
++ TEST_VDUP_LANEQ(, poly, p, 8, 8, 16, 7);
++ TEST_VDUP_LANEQ(, poly, p, 16, 4, 8, 3);
++#if defined (FP16_SUPPORTED)
++ TEST_VDUP_LANEQ(, float, f, 16, 4, 8, 3);
++#endif
++ TEST_VDUP_LANEQ(, float, f, 32, 2, 4, 1);
++
++ TEST_VDUP_LANEQ(q, int, s, 8, 16, 16, 11);
++ TEST_VDUP_LANEQ(q, int, s, 16, 8, 8, 7);
++ TEST_VDUP_LANEQ(q, int, s, 32, 4, 4, 1);
++ TEST_VDUP_LANEQ(q, int, s, 64, 2, 2, 0);
++ TEST_VDUP_LANEQ(q, uint, u, 8, 16, 16, 5);
++ TEST_VDUP_LANEQ(q, uint, u, 16, 8, 8, 1);
++ TEST_VDUP_LANEQ(q, uint, u, 32, 4, 4, 0);
++ TEST_VDUP_LANEQ(q, uint, u, 64, 2, 2, 0);
++ TEST_VDUP_LANEQ(q, poly, p, 8, 16, 16, 5);
++ TEST_VDUP_LANEQ(q, poly, p, 16, 8, 8, 1);
++#if defined (FP16_SUPPORTED)
++ TEST_VDUP_LANEQ(q, float, f, 16, 8, 8, 7);
++#endif
++ TEST_VDUP_LANEQ(q, float, f, 32, 4, 4, 1);
++
++ CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
++#if defined (FP16_SUPPORTED)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected2, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected2, "");
++#endif
++
++#endif /* __aarch64__. */
}
--/* vcvt (float -> double). */
-+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-+vcvt_n_s32_f32 (float32x2_t __a, const int __b)
+ int main (void)
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vduph_lane.c
+@@ -0,0 +1,137 @@
++/* { dg-do run } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++#define A -16
++#define B -15
++#define C -14
++#define D -13
++#define E -12
++#define F -11
++#define G -10
++#define H -9
++
++#define F16_C(a) ((__fp16) a)
++#define AF F16_C (A)
++#define BF F16_C (B)
++#define CF F16_C (C)
++#define DF F16_C (D)
++#define EF F16_C (E)
++#define FF F16_C (F)
++#define GF F16_C (G)
++#define HF F16_C (H)
++
++#define S16_C(a) ((int16_t) a)
++#define AS S16_C (A)
++#define BS S16_C (B)
++#define CS S16_C (C)
++#define DS S16_C (D)
++#define ES S16_C (E)
++#define FS S16_C (F)
++#define GS S16_C (G)
++#define HS S16_C (H)
++
++#define U16_C(a) ((int16_t) a)
++#define AU U16_C (A)
++#define BU U16_C (B)
++#define CU U16_C (C)
++#define DU U16_C (D)
++#define EU U16_C (E)
++#define FU U16_C (F)
++#define GU U16_C (G)
++#define HU U16_C (H)
++
++#define P16_C(a) ((poly16_t) a)
++#define AP P16_C (A)
++#define BP P16_C (B)
++#define CP P16_C (C)
++#define DP P16_C (D)
++#define EP P16_C (E)
++#define FP P16_C (F)
++#define GP P16_C (G)
++#define HP P16_C (H)
++
++/* Expected results for vduph_lane. */
++float16_t expected_f16 = AF;
++int16_t expected_s16 = DS;
++uint16_t expected_u16 = BU;
++poly16_t expected_p16 = CP;
++
++/* Expected results for vduph_laneq. */
++float16_t expected_q_f16 = EF;
++int16_t expected_q_s16 = BS;
++uint16_t expected_q_u16 = GU;
++poly16_t expected_q_p16 = FP;
++
++void exec_vduph_lane_f16 (void)
+{
-+ return __builtin_aarch64_fcvtzsv2sf (__a, __b);
++ /* vduph_lane. */
++ DECL_VARIABLE(vsrc, float, 16, 4);
++ DECL_VARIABLE(vsrc, int, 16, 4);
++ DECL_VARIABLE(vsrc, uint, 16, 4);
++ DECL_VARIABLE(vsrc, poly, 16, 4);
++ VECT_VAR_DECL (buf_src, float, 16, 4) [] = {AF, BF, CF, DF};
++ VECT_VAR_DECL (buf_src, int, 16, 4) [] = {AS, BS, CS, DS};
++ VECT_VAR_DECL (buf_src, uint, 16, 4) [] = {AU, BU, CU, DU};
++ VECT_VAR_DECL (buf_src, poly, 16, 4) [] = {AP, BP, CP, DP};
++ VLOAD (vsrc, buf_src, , int, s, 16, 4);
++ VLOAD (vsrc, buf_src, , float, f, 16, 4);
++ VLOAD (vsrc, buf_src, , uint, u, 16, 4);
++ VLOAD (vsrc, buf_src, , poly, p, 16, 4);
++
++ float16_t res_f = vduph_lane_f16 (VECT_VAR (vsrc, float, 16, 4), 0);
++ if (* (unsigned short *) &res_f != * (unsigned short *) &expected_f16)
++ abort ();
++
++ int16_t res_s = vduph_lane_s16 (VECT_VAR (vsrc, int, 16, 4), 3);
++ if (* (unsigned short *) &res_s != * (unsigned short *) &expected_s16)
++ abort ();
++
++ uint16_t res_u = vduph_lane_u16 (VECT_VAR (vsrc, uint, 16, 4), 1);
++ if (* (unsigned short *) &res_u != * (unsigned short *) &expected_u16)
++ abort ();
++
++ poly16_t res_p = vduph_lane_p16 (VECT_VAR (vsrc, poly, 16, 4), 2);
++ if (* (unsigned short *) &res_p != * (unsigned short *) &expected_p16)
++ abort ();
++
++ /* vduph_laneq. */
++ DECL_VARIABLE(vsrc, float, 16, 8);
++ DECL_VARIABLE(vsrc, int, 16, 8);
++ DECL_VARIABLE(vsrc, uint, 16, 8);
++ DECL_VARIABLE(vsrc, poly, 16, 8);
++ VECT_VAR_DECL (buf_src, float, 16, 8) [] = {AF, BF, CF, DF, EF, FF, GF, HF};
++ VECT_VAR_DECL (buf_src, int, 16, 8) [] = {AS, BS, CS, DS, ES, FS, GS, HS};
++ VECT_VAR_DECL (buf_src, uint, 16, 8) [] = {AU, BU, CU, DU, EU, FU, GU, HU};
++ VECT_VAR_DECL (buf_src, poly, 16, 8) [] = {AP, BP, CP, DP, EP, FP, GP, HP};
++ VLOAD (vsrc, buf_src, q, int, s, 16, 8);
++ VLOAD (vsrc, buf_src, q, float, f, 16, 8);
++ VLOAD (vsrc, buf_src, q, uint, u, 16, 8);
++ VLOAD (vsrc, buf_src, q, poly, p, 16, 8);
++
++ res_f = vduph_laneq_f16 (VECT_VAR (vsrc, float, 16, 8), 4);
++ if (* (unsigned short *) &res_f != * (unsigned short *) &expected_q_f16)
++ abort ();
++
++ res_s = vduph_laneq_s16 (VECT_VAR (vsrc, int, 16, 8), 1);
++ if (* (unsigned short *) &res_s != * (unsigned short *) &expected_q_s16)
++ abort ();
++
++ res_u = vduph_laneq_u16 (VECT_VAR (vsrc, uint, 16, 8), 6);
++ if (* (unsigned short *) &res_u != * (unsigned short *) &expected_q_u16)
++ abort ();
++
++ res_p = vduph_laneq_p16 (VECT_VAR (vsrc, poly, 16, 8), 5);
++ if (* (unsigned short *) &res_p != * (unsigned short *) &expected_q_p16)
++ abort ();
++}
++
++int
++main (void)
++{
++ exec_vduph_lane_f16 ();
++ return 0;
+}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vext.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vext.c
+@@ -16,6 +16,10 @@ VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
+ VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf6, 0xf7, 0x55, 0x55,
+ 0x55, 0x55, 0x55, 0x55 };
+ VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff2, 0xfff3, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcb00, 0xca80,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1700000, 0x42066666 };
+ VECT_VAR_DECL(expected,int,8,16) [] = { 0xfe, 0xff, 0x11, 0x11,
+ 0x11, 0x11, 0x11, 0x11,
+@@ -39,6 +43,12 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0xfc, 0xfd, 0xfe, 0xff,
+ 0x55, 0x55, 0x55, 0x55 };
+ VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff6, 0xfff7, 0x66, 0x66,
+ 0x66, 0x66, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xc880, 0x4b4d,
++ 0x4b4d, 0x4b4d,
++ 0x4b4d, 0x4b4d,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1500000, 0x4204cccd,
+ 0x4204cccd, 0x4204cccd };
--__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
--vcvt_f32_f16 (float16x4_t __a)
-+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-+vcvt_n_u32_f32 (float32x2_t __a, const int __b)
- {
-- return __builtin_aarch64_float_extend_lo_v4sf (__a);
-+ return __builtin_aarch64_fcvtzuv2sf_uss (__a, __b);
- }
+@@ -60,6 +70,10 @@ void exec_vext (void)
+ clean_results ();
--__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
--vcvt_f64_f32 (float32x2_t __a)
-+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-+vcvtq_n_s32_f32 (float32x4_t __a, const int __b)
- {
-+ return __builtin_aarch64_fcvtzsv4sf (__a, __b);
-+}
+ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector1, buffer);
++#ifdef FP16_SUPPORTED
++ VLOAD(vector1, buffer, , float, f, 16, 4);
++ VLOAD(vector1, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector1, buffer, , float, f, 32, 2);
+ VLOAD(vector1, buffer, q, float, f, 32, 4);
-- return __builtin_aarch64_float_extend_lo_v2df (__a);
-+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-+vcvtq_n_u32_f32 (float32x4_t __a, const int __b)
-+{
-+ return __builtin_aarch64_fcvtzuv4sf_uss (__a, __b);
- }
+@@ -74,6 +88,9 @@ void exec_vext (void)
+ VDUP(vector2, , uint, u, 64, 1, 0x88);
+ VDUP(vector2, , poly, p, 8, 8, 0x55);
+ VDUP(vector2, , poly, p, 16, 4, 0x66);
++#if defined (FP16_SUPPORTED)
++ VDUP (vector2, , float, f, 16, 4, 14.6f); /* 14.6f is 0x4b4d. */
++#endif
+ VDUP(vector2, , float, f, 32, 2, 33.6f);
--__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
--vcvt_high_f32_f16 (float16x8_t __a)
-+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-+vcvtq_n_s64_f64 (float64x2_t __a, const int __b)
- {
-- return __builtin_aarch64_vec_unpacks_hi_v8hf (__a);
-+ return __builtin_aarch64_fcvtzsv2df (__a, __b);
- }
+ VDUP(vector2, q, int, s, 8, 16, 0x11);
+@@ -86,6 +103,9 @@ void exec_vext (void)
+ VDUP(vector2, q, uint, u, 64, 2, 0x88);
+ VDUP(vector2, q, poly, p, 8, 16, 0x55);
+ VDUP(vector2, q, poly, p, 16, 8, 0x66);
++#if defined (FP16_SUPPORTED)
++ VDUP (vector2, q, float, f, 16, 8, 14.6f);
++#endif
+ VDUP(vector2, q, float, f, 32, 4, 33.2f);
--__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
--vcvt_high_f64_f32 (float32x4_t __a)
-+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-+vcvtq_n_u64_f64 (float64x2_t __a, const int __b)
- {
-- return __builtin_aarch64_vec_unpacks_hi_v4sf (__a);
-+ return __builtin_aarch64_fcvtzuv2df_uss (__a, __b);
- }
+ /* Choose arbitrary extract offsets. */
+@@ -99,6 +119,9 @@ void exec_vext (void)
+ TEST_VEXT(, uint, u, 64, 1, 0);
+ TEST_VEXT(, poly, p, 8, 8, 6);
+ TEST_VEXT(, poly, p, 16, 4, 2);
++#if defined (FP16_SUPPORTED)
++ TEST_VEXT(, float, f, 16, 4, 2);
++#endif
+ TEST_VEXT(, float, f, 32, 2, 1);
- /* vcvt (<u>int -> float) */
-@@ -14456,6 +13883,12 @@ vfma_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
- return __builtin_aarch64_fmav2sf (__b, vdup_n_f32 (__c), __a);
- }
+ TEST_VEXT(q, int, s, 8, 16, 14);
+@@ -111,9 +134,16 @@ void exec_vext (void)
+ TEST_VEXT(q, uint, u, 64, 2, 1);
+ TEST_VEXT(q, poly, p, 8, 16, 12);
+ TEST_VEXT(q, poly, p, 16, 8, 6);
++#if defined (FP16_SUPPORTED)
++ TEST_VEXT(q, float, f, 16, 8, 7);
++#endif
+ TEST_VEXT(q, float, f, 32, 4, 3);
-+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
-+vfma_n_f64 (float64x1_t __a, float64x1_t __b, float64_t __c)
-+{
-+ return (float64x1_t) {__b[0] * __c + __a[0]};
-+}
-+
- __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
- vfmaq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
- {
-@@ -14597,6 +14030,29 @@ vfmsq_f64 (float64x2_t __a, float64x2_t __b, float64x2_t __c)
- return __builtin_aarch64_fmav2df (-__b, __c, __a);
++#if defined (FP16_SUPPORTED)
++ CHECK_RESULTS (TEST_MSG, "");
++#else
+ CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
++#endif
}
-+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-+vfms_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
-+{
-+ return __builtin_aarch64_fmav2sf (-__b, vdup_n_f32 (__c), __a);
-+}
+ int main (void)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfma.c
+@@ -3,11 +3,19 @@
+ #include "compute-ref-data.h"
+
+ #ifdef __ARM_FEATURE_FMA
+
-+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
-+vfms_n_f64 (float64x1_t __a, float64x1_t __b, float64_t __c)
-+{
-+ return (float64x1_t) {-__b[0] * __c + __a[0]};
-+}
+ /* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0x61c6, 0x61c8, 0x61ca, 0x61cc };
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0x6435, 0x6436, 0x6437, 0x6438,
++ 0x6439, 0x643a, 0x643b, 0x643c };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x4438ca3d, 0x44390a3d };
+-VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x44869eb8, 0x4486beb8, 0x4486deb8, 0x4486feb8 };
++VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x44869eb8, 0x4486beb8,
++ 0x4486deb8, 0x4486feb8 };
+ #ifdef __aarch64__
+-VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0x408906e1532b8520, 0x40890ee1532b8520 };
++VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0x408906e1532b8520,
++ 0x40890ee1532b8520 };
+ #endif
+
+ #define TEST_MSG "VFMA/VFMAQ"
+@@ -44,6 +52,18 @@ void exec_vfma (void)
+ DECL_VARIABLE(VAR, float, 32, 4);
+ #endif
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector1, float, 16, 4);
++ DECL_VARIABLE(vector2, float, 16, 4);
++ DECL_VARIABLE(vector3, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 4);
++
++ DECL_VARIABLE(vector1, float, 16, 8);
++ DECL_VARIABLE(vector2, float, 16, 8);
++ DECL_VARIABLE(vector3, float, 16, 8);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
+
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vfmsq_n_f32 (float32x4_t __a, float32x4_t __b, float32_t __c)
-+{
-+ return __builtin_aarch64_fmav4sf (-__b, vdupq_n_f32 (__c), __a);
-+}
+ DECL_VFMA_VAR(vector1);
+ DECL_VFMA_VAR(vector2);
+ DECL_VFMA_VAR(vector3);
+@@ -52,6 +72,10 @@ void exec_vfma (void)
+ clean_results ();
+
+ /* Initialize input "vector1" from "buffer". */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector1, buffer, , float, f, 16, 4);
++ VLOAD(vector1, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector1, buffer, , float, f, 32, 2);
+ VLOAD(vector1, buffer, q, float, f, 32, 4);
+ #ifdef __aarch64__
+@@ -59,13 +83,21 @@ void exec_vfma (void)
+ #endif
+
+ /* Choose init value arbitrarily. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector2, , float, f, 16, 4, 9.3f);
++ VDUP(vector2, q, float, f, 16, 8, 29.7f);
++#endif
+ VDUP(vector2, , float, f, 32, 2, 9.3f);
+ VDUP(vector2, q, float, f, 32, 4, 29.7f);
+ #ifdef __aarch64__
+ VDUP(vector2, q, float, f, 64, 2, 15.8f);
+ #endif
+-
+
-+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-+vfmsq_n_f64 (float64x2_t __a, float64x2_t __b, float64_t __c)
-+{
-+ return __builtin_aarch64_fmav2df (-__b, vdupq_n_f64 (__c), __a);
-+}
+ /* Choose init value arbitrarily. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector3, , float, f, 16, 4, 81.2f);
++ VDUP(vector3, q, float, f, 16, 8, 36.8f);
++#endif
+ VDUP(vector3, , float, f, 32, 2, 81.2f);
+ VDUP(vector3, q, float, f, 32, 4, 36.8f);
+ #ifdef __aarch64__
+@@ -73,12 +105,20 @@ void exec_vfma (void)
+ #endif
- /* vfms_lane */
+ /* Execute the tests. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VFMA(, float, f, 16, 4);
++ TEST_VFMA(q, float, f, 16, 8);
++#endif
+ TEST_VFMA(, float, f, 32, 2);
+ TEST_VFMA(q, float, f, 32, 4);
+ #ifdef __aarch64__
+ TEST_VFMA(q, float, f, 64, 2);
+ #endif
-@@ -18895,6 +18351,160 @@ vmulq_laneq_u32 (uint32x4_t __a, uint32x4_t __b, const int __lane)
- return __a * __aarch64_vget_lane_any (__b, __lane);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
++#endif
+ CHECK_VFMA_RESULTS (TEST_MSG, "");
}
-
-+/* vmul_n. */
+ #endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
-+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-+vmul_n_f32 (float32x2_t __a, float32_t __b)
-+{
-+ return __a * __b;
-+}
++#include <arm_fp16.h>
+
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vmulq_n_f32 (float32x4_t __a, float32_t __b)
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
+{
-+ return __a * __b;
-+}
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x3944 /* 0.658203 */,
++ 0xcefa /* -27.906250 */,
++ 0x5369 /* 59.281250 */,
++ 0x35ba /* 0.357910 */,
++ 0xc574 /* -5.453125 */,
++ 0xc5e6 /* -5.898438 */,
++ 0x3f66 /* 1.849609 */,
++ 0x5665 /* 102.312500 */,
++ 0xc02d /* -2.087891 */,
++ 0x4d79 /* 21.890625 */,
++ 0x547b /* 71.687500 */,
++ 0xcdf0 /* -23.750000 */,
++ 0xc625 /* -6.144531 */,
++ 0x4cf9 /* 19.890625 */,
++ 0x7e00 /* nan */,
++ 0x7e00 /* nan */
++};
+
-+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-+vmulq_n_f64 (float64x2_t __a, float64_t __b)
-+{
-+ return __a * __b;
-+}
++#define TEST_MSG "VFMAH_F16"
++#define INSN_NAME vfmah_f16
+
-+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-+vmul_n_s16 (int16x4_t __a, int16_t __b)
-+{
-+ return __a * __b;
-+}
++#define EXPECTED expected
+
-+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-+vmulq_n_s16 (int16x8_t __a, int16_t __b)
-+{
-+ return __a * __b;
-+}
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
+
-+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-+vmul_n_s32 (int32x2_t __a, int32_t __b)
-+{
-+ return __a * __b;
-+}
++/* Include the template for binary scalar operations. */
++#include "ternary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmas_lane_f16_1.c
+@@ -0,0 +1,908 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-+vmulq_n_s32 (int32x4_t __a, int32_t __b)
-+{
-+ return __a * __b;
-+}
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-+vmul_n_u16 (uint16x4_t __a, uint16_t __b)
++#define FP16_C(a) ((__fp16) a)
++#define A0 FP16_C (123.4)
++#define A1 FP16_C (-5.8)
++#define A2 FP16_C (-0.0)
++#define A3 FP16_C (10)
++#define A4 FP16_C (123412.43)
++#define A5 FP16_C (-5.8)
++#define A6 FP16_C (90.8)
++#define A7 FP16_C (24)
++
++#define B0 FP16_C (23.4)
++#define B1 FP16_C (-5.8)
++#define B2 FP16_C (8.9)
++#define B3 FP16_C (4.0)
++#define B4 FP16_C (3.4)
++#define B5 FP16_C (-550.8)
++#define B6 FP16_C (-31.8)
++#define B7 FP16_C (20000.0)
++
++/* Expected results for vfma_lane. */
++VECT_VAR_DECL (expected0_static, hfloat, 16, 4) []
++ = { 0x613E /* A0 + B0 * B0. */,
++ 0xD86D /* A1 + B1 * B0. */,
++ 0x5A82 /* A2 + B2 * B0. */,
++ 0x567A /* A3 + B3 * B0. */};
++
++VECT_VAR_DECL (expected1_static, hfloat, 16, 4) []
++ = { 0xCA33 /* A0 + B0 * B1. */,
++ 0x4EF6 /* A1 + B1 * B1. */,
++ 0xD274 /* A2 + B2 * B1. */,
++ 0xCA9A /* A3 + B3 * B1. */ };
++
++VECT_VAR_DECL (expected2_static, hfloat, 16, 4) []
++ = { 0x5D2F /* A0 + B0 * B2. */,
++ 0xD32D /* A1 + B1 * B2. */,
++ 0x54F3 /* A2 + B2 * B2. */,
++ 0x51B3 /* A3 + B3 * B2. */ };
++
++VECT_VAR_DECL (expected3_static, hfloat, 16, 4) []
++ = { 0x5AC8 /* A0 + B0 * B3. */,
++ 0xCF40 /* A1 + B1 * B3. */,
++ 0x5073 /* A2 + B2 * B3. */,
++ 0x4E80 /* A3 + B3 * B3. */ };
++
++/* Expected results for vfmaq_lane. */
++VECT_VAR_DECL (expected0_static, hfloat, 16, 8) []
++ = { 0x613E /* A0 + B0 * B0. */,
++ 0xD86D /* A1 + B1 * B0. */,
++ 0x5A82 /* A2 + B2 * B0. */,
++ 0x567A /* A3 + B3 * B0. */,
++ 0x7C00 /* A4 + B4 * B0. */,
++ 0xF24D /* A5 + B5 * B0. */,
++ 0xE11B /* A6 + B6 * B0. */,
++ 0x7C00 /* A7 + B7 * B0. */ };
++
++VECT_VAR_DECL (expected1_static, hfloat, 16, 8) []
++ = { 0xCA33 /* A0 + B0 * B1. */,
++ 0x4EF6 /* A1 + B1 * B1. */,
++ 0xD274 /* A2 + B2 * B1. */,
++ 0xCA9A /* A3 + B3 * B1. */,
++ 0x7C00 /* A4 + B4 * B1. */,
++ 0x6A3B /* A5 + B5 * B1. */,
++ 0x5C4D /* A6 + B6 * B1. */,
++ 0xFC00 /* A7 + B7 * B1. */ };
++
++VECT_VAR_DECL (expected2_static, hfloat, 16, 8) []
++ = { 0x5D2F /* A0 + B0 * B2. */,
++ 0xD32D /* A1 + B1 * B2. */,
++ 0x54F3 /* A2 + B2 * B2. */,
++ 0x51B3 /* A3 + B3 * B2. */,
++ 0x7C00 /* A4 + B4 * B2. */,
++ 0xECCB /* A5 + B5 * B2. */,
++ 0xDA01 /* A6 + B6 * B2. */,
++ 0x7C00 /* A7 + B7 * B2. */ };
++
++VECT_VAR_DECL (expected3_static, hfloat, 16, 8) []
++ = { 0x5AC8 /* A0 + B0 * B3. */,
++ 0xCF40 /* A1 + B1 * B3. */,
++ 0x5073 /* A2 + B2 * B3. */,
++ 0x4E80 /* A3 + B3 * B3. */,
++ 0x7C00 /* A4 + B4 * B3. */,
++ 0xE851 /* A5 + B5 * B3. */,
++ 0xD08C /* A6 + B6 * B3. */,
++ 0x7C00 /* A7 + B7 * B3. */ };
++
++/* Expected results for vfma_laneq. */
++VECT_VAR_DECL (expected0_laneq_static, hfloat, 16, 4) []
++ = { 0x613E /* A0 + B0 * B0. */,
++ 0xD86D /* A1 + B1 * B0. */,
++ 0x5A82 /* A2 + B2 * B0. */,
++ 0x567A /* A3 + B3 * B0. */ };
++
++VECT_VAR_DECL (expected1_laneq_static, hfloat, 16, 4) []
++ = { 0xCA33 /* A0 + B0 * B1. */,
++ 0x4EF6 /* A1 + B1 * B1. */,
++ 0xD274 /* A2 + B2 * B1. */,
++ 0xCA9A /* A3 + B3 * B1. */ };
++
++VECT_VAR_DECL (expected2_laneq_static, hfloat, 16, 4) []
++ = { 0x5D2F /* A0 + B0 * B2. */,
++ 0xD32D /* A1 + B1 * B2. */,
++ 0x54F3 /* A2 + B2 * B2. */,
++ 0x51B3 /* A3 + B3 * B2. */ };
++
++VECT_VAR_DECL (expected3_laneq_static, hfloat, 16, 4) []
++ = { 0x5AC8 /* A0 + B0 * B3. */,
++ 0xCF40 /* A1 + B1 * B3. */,
++ 0x5073 /* A2 + B2 * B3. */,
++ 0x4E80 /* A3 + B3 * B3. */ };
++
++VECT_VAR_DECL (expected4_laneq_static, hfloat, 16, 4) []
++ = { 0x5A58 /* A0 + B0 * B4. */,
++ 0xCE62 /* A1 + B1 * B4. */,
++ 0x4F91 /* A2 + B2 * B4. */,
++ 0x4DE6 /* A3 + B3 * B4. */ };
++
++VECT_VAR_DECL (expected5_laneq_static, hfloat, 16, 4) []
++ = { 0xF23D /* A0 + B0 * B5. */,
++ 0x6A3B /* A1 + B1 * B5. */,
++ 0xECCA /* A2 + B2 * B5. */,
++ 0xE849 /* A3 + B3 * B5. */ };
++
++VECT_VAR_DECL (expected6_laneq_static, hfloat, 16, 4) []
++ = { 0xE0DA /* A0 + B0 * B6. */,
++ 0x5995 /* A1 + B1 * B6. */,
++ 0xDC6C /* A2 + B2 * B6. */,
++ 0xD753 /* A3 + B3 * B6. */ };
++
++VECT_VAR_DECL (expected7_laneq_static, hfloat, 16, 4) []
++ = { 0x7C00 /* A0 + B0 * B7. */,
++ 0xFC00 /* A1 + B1 * B7. */,
++ 0x7C00 /* A2 + B2 * B7. */,
++ 0x7C00 /* A3 + B3 * B7. */ };
++
++/* Expected results for vfmaq_laneq. */
++VECT_VAR_DECL (expected0_laneq_static, hfloat, 16, 8) []
++ = { 0x613E /* A0 + B0 * B0. */,
++ 0xD86D /* A1 + B1 * B0. */,
++ 0x5A82 /* A2 + B2 * B0. */,
++ 0x567A /* A3 + B3 * B0. */,
++ 0x7C00 /* A4 + B4 * B0. */,
++ 0xF24D /* A5 + B5 * B0. */,
++ 0xE11B /* A6 + B6 * B0. */,
++ 0x7C00 /* A7 + B7 * B0. */ };
++
++VECT_VAR_DECL (expected1_laneq_static, hfloat, 16, 8) []
++ = { 0xCA33 /* A0 + B0 * B1. */,
++ 0x4EF6 /* A1 + B1 * B1. */,
++ 0xD274 /* A2 + B2 * B1. */,
++ 0xCA9A /* A3 + B3 * B1. */,
++ 0x7C00 /* A4 + B4 * B1. */,
++ 0x6A3B /* A5 + B5 * B1. */,
++ 0x5C4D /* A6 + B6 * B1. */,
++ 0xFC00 /* A7 + B7 * B1. */ };
++
++VECT_VAR_DECL (expected2_laneq_static, hfloat, 16, 8) []
++ = { 0x5D2F /* A0 + B0 * B2. */,
++ 0xD32D /* A1 + B1 * B2. */,
++ 0x54F3 /* A2 + B2 * B2. */,
++ 0x51B3 /* A3 + B3 * B2. */,
++ 0x7C00 /* A4 + B4 * B2. */,
++ 0xECCB /* A5 + B5 * B2. */,
++ 0xDA01 /* A6 + B6 * B2. */,
++ 0x7C00 /* A7 + B7 * B2. */ };
++
++VECT_VAR_DECL (expected3_laneq_static, hfloat, 16, 8) []
++ = { 0x5AC8 /* A0 + B0 * B3. */,
++ 0xCF40 /* A1 + B1 * B3. */,
++ 0x5073 /* A2 + B2 * B3. */,
++ 0x4E80 /* A3 + B3 * B3. */,
++ 0x7C00 /* A4 + B4 * B3. */,
++ 0xE851 /* A5 + B5 * B3. */,
++ 0xD08C /* A6 + B6 * B3. */,
++ 0x7C00 /* A7 + B7 * B3. */ };
++
++VECT_VAR_DECL (expected4_laneq_static, hfloat, 16, 8) []
++ = { 0x5A58 /* A0 + B0 * B4. */,
++ 0xCE62 /* A1 + B1 * B4. */,
++ 0x4F91 /* A2 + B2 * B4. */,
++ 0x4DE6 /* A3 + B3 * B4. */,
++ 0x7C00 /* A4 + B4 * B4. */,
++ 0xE757 /* A5 + B5 * B4. */,
++ 0xCC54 /* A6 + B6 * B4. */,
++ 0x7C00 /* A7 + B7 * B4. */ };
++
++VECT_VAR_DECL (expected5_laneq_static, hfloat, 16, 8) []
++ = { 0xF23D /* A0 + B0 * B5. */,
++ 0x6A3B /* A1 + B1 * B5. */,
++ 0xECCA /* A2 + B2 * B5. */,
++ 0xE849 /* A3 + B3 * B5. */,
++ 0x7C00 /* A4 + B4 * B5. */,
++ 0x7C00 /* A5 + B5 * B5. */,
++ 0x744D /* A6 + B6 * B5. */,
++ 0xFC00 /* A7 + B7 * B5. */ };
++
++VECT_VAR_DECL (expected6_laneq_static, hfloat, 16, 8) []
++ = { 0xE0DA /* A0 + B0 * B6. */,
++ 0x5995 /* A1 + B1 * B6. */,
++ 0xDC6C /* A2 + B2 * B6. */,
++ 0xD753 /* A3 + B3 * B6. */,
++ 0x7C00 /* A4 + B4 * B6. */,
++ 0x7447 /* A5 + B5 * B6. */,
++ 0x644E /* A6 + B6 * B6. */,
++ 0xFC00 /* A7 + B7 * B6. */ };
++
++VECT_VAR_DECL (expected7_laneq_static, hfloat, 16, 8) []
++ = { 0x7C00 /* A0 + B0 * B7. */,
++ 0xFC00 /* A1 + B1 * B7. */,
++ 0x7C00 /* A2 + B2 * B7. */,
++ 0x7C00 /* A3 + B3 * B7. */,
++ 0x7C00 /* A4 + B4 * B7. */,
++ 0xFC00 /* A5 + B5 * B7. */,
++ 0xFC00 /* A6 + B6 * B7. */,
++ 0x7C00 /* A7 + B7 * B7. */ };
++
++/* Expected results for vfms_lane. */
++VECT_VAR_DECL (expected0_fms_static, hfloat, 16, 4) []
++ = { 0xDEA2 /* A0 + (-B0) * B0. */,
++ 0x5810 /* A1 + (-B1) * B0. */,
++ 0xDA82 /* A2 + (-B2) * B0. */,
++ 0xD53A /* A3 + (-B3) * B0. */ };
++
++VECT_VAR_DECL (expected1_fms_static, hfloat, 16, 4) []
++ = { 0x5C0D /* A0 + (-B0) * B1. */,
++ 0xD0EE /* A1 + (-B1) * B1. */,
++ 0x5274 /* A2 + (-B2) * B1. */,
++ 0x5026 /* A3 + (-B3) * B1. */ };
++
++VECT_VAR_DECL (expected2_fms_static, hfloat, 16, 4) []
++ = { 0xD54E /* A0 + (-B0) * B2. */,
++ 0x51BA /* A1 + (-B1) * B2. */,
++ 0xD4F3 /* A2 + (-B2) * B2. */,
++ 0xCE66 /* A3 + (-B3) * B2. */ };
++
++VECT_VAR_DECL (expected3_fms_static, hfloat, 16, 4) []
++ = { 0x4F70 /* A0 + (-B0) * B3. */,
++ 0x4C5A /* A1 + (-B1) * B3. */,
++ 0xD073 /* A2 + (-B2) * B3. */,
++ 0xC600 /* A3 + (-B3) * B3. */ };
++
++/* Expected results for vfmsq_lane. */
++VECT_VAR_DECL (expected0_fms_static, hfloat, 16, 8) []
++ = { 0xDEA2 /* A0 + (-B0) * B0. */,
++ 0x5810 /* A1 + (-B1) * B0. */,
++ 0xDA82 /* A2 + (-B2) * B0. */,
++ 0xD53A /* A3 + (-B3) * B0. */,
++ 0x7C00 /* A4 + (-B4) * B0. */,
++ 0x724B /* A5 + (-B5) * B0. */,
++ 0x6286 /* A6 + (-B6) * B0. */,
++ 0xFC00 /* A7 + (-B7) * B0. */ };
++
++VECT_VAR_DECL (expected1_fms_static, hfloat, 16, 8) []
++ = { 0x5C0D /* A0 + (-B0) * B1. */,
++ 0xD0EE /* A1 + (-B1) * B1. */,
++ 0x5274 /* A2 + (-B2) * B1. */,
++ 0x5026 /* A3 + (-B3) * B1. */,
++ 0x7C00 /* A4 + (-B4) * B1. */,
++ 0xEA41 /* A5 + (-B5) * B1. */,
++ 0xD5DA /* A6 + (-B6) * B1. */,
++ 0x7C00 /* A7 + (-B7) * B1. */ };
++
++VECT_VAR_DECL (expected2_fms_static, hfloat, 16, 8) []
++ = { 0xD54E /* A0 + (-B0) * B2. */,
++ 0x51BA /* A1 + (-B1) * B2. */,
++ 0xD4F3 /* A2 + (-B2) * B2. */,
++ 0xCE66 /* A3 + (-B3) * B2. */,
++ 0x7C00 /* A4 + (-B4) * B2. */,
++ 0x6CC8 /* A5 + (-B5) * B2. */,
++ 0x5DD7 /* A6 + (-B6) * B2. */,
++ 0xFC00 /* A7 + (-B7) * B2. */ };
++
++VECT_VAR_DECL (expected3_fms_static, hfloat, 16, 8) []
++ = { 0x4F70 /* A0 + (-B0) * B3. */,
++ 0x4C5A /* A1 + (-B1) * B3. */,
++ 0xD073 /* A2 + (-B2) * B3. */,
++ 0xC600 /* A3 + (-B3) * B3. */,
++ 0x7C00 /* A4 + (-B4) * B3. */,
++ 0x684B /* A5 + (-B5) * B3. */,
++ 0x5AD0 /* A6 + (-B6) * B3. */,
++ 0xFC00 /* A7 + (-B7) * B3. */ };
++
++/* Expected results for vfms_laneq. */
++VECT_VAR_DECL (expected0_fms_laneq_static, hfloat, 16, 4) []
++ = { 0xDEA2 /* A0 + (-B0) * B0. */,
++ 0x5810 /* A1 + (-B1) * B0. */,
++ 0xDA82 /* A2 + (-B2) * B0. */,
++ 0xD53A /* A3 + (-B3) * B0. */ };
++
++VECT_VAR_DECL (expected1_fms_laneq_static, hfloat, 16, 4) []
++ = { 0x5C0D /* A0 + (-B0) * B1. */,
++ 0xD0EE /* A1 + (-B1) * B1. */,
++ 0x5274 /* A2 + (-B2) * B1. */,
++ 0x5026 /* A3 + (-B3) * B1. */ };
++
++VECT_VAR_DECL (expected2_fms_laneq_static, hfloat, 16, 4) []
++ = { 0xD54E /* A0 + (-B0) * B2. */,
++ 0x51BA /* A1 + (-B1) * B2. */,
++ 0xD4F3 /* A2 + (-B2) * B2. */,
++ 0xCE66 /* A3 + (-B3) * B2. */ };
++
++VECT_VAR_DECL (expected3_fms_laneq_static, hfloat, 16, 4) []
++ = { 0x4F70 /* A0 + (-B0) * B3. */,
++ 0x4C5A /* A1 + (-B1) * B3. */,
++ 0xD073 /* A2 + (-B2) * B3. */,
++ 0xC600 /* A3 + (-B3) * B3. */ };
++
++VECT_VAR_DECL (expected4_fms_laneq_static, hfloat, 16, 4) []
++ = { 0x5179 /* A0 + (-B0) * B4. */,
++ 0x4AF6 /* A1 + (-B1) * B4. */,
++ 0xCF91 /* A2 + (-B2) * B4. */,
++ 0xC334 /* A3 + (-B3) * B4. */ };
++
++VECT_VAR_DECL (expected5_fms_laneq_static, hfloat, 16, 4) []
++ = { 0x725C /* A0 + (-B0) * B5. */,
++ 0xEA41 /* A1 + (-B1) * B5. */,
++ 0x6CCA /* A2 + (-B2) * B5. */,
++ 0x6853 /* A3 + (-B3) * B5. */ };
++
++VECT_VAR_DECL (expected6_fms_laneq_static, hfloat, 16, 4) []
++ = { 0x62C7 /* A0 + (-B0) * B6. */,
++ 0xD9F2 /* A1 + (-B1) * B6. */,
++ 0x5C6C /* A2 + (-B2) * B6. */,
++ 0x584A /* A3 + (-B3) * B6. */ };
++
++VECT_VAR_DECL (expected7_fms_laneq_static, hfloat, 16, 4) []
++ = { 0xFC00 /* A0 + (-B0) * B7. */,
++ 0x7C00 /* A1 + (-B1) * B7. */,
++ 0xFC00 /* A2 + (-B2) * B7. */,
++ 0xFC00 /* A3 + (-B3) * B7. */ };
++
++/* Expected results for vfmsq_laneq. */
++VECT_VAR_DECL (expected0_fms_laneq_static, hfloat, 16, 8) []
++ = { 0xDEA2 /* A0 + (-B0) * B0. */,
++ 0x5810 /* A1 + (-B1) * B0. */,
++ 0xDA82 /* A2 + (-B2) * B0. */,
++ 0xD53A /* A3 + (-B3) * B0. */,
++ 0x7C00 /* A4 + (-B4) * B0. */,
++ 0x724B /* A5 + (-B5) * B0. */,
++ 0x6286 /* A6 + (-B6) * B0. */,
++ 0xFC00 /* A7 + (-B7) * B0. */ };
++
++VECT_VAR_DECL (expected1_fms_laneq_static, hfloat, 16, 8) []
++ = { 0x5C0D /* A0 + (-B0) * B1. */,
++ 0xD0EE /* A1 + (-B1) * B1. */,
++ 0x5274 /* A2 + (-B2) * B1. */,
++ 0x5026 /* A3 + (-B3) * B1. */,
++ 0x7C00 /* A4 + (-B4) * B1. */,
++ 0xEA41 /* A5 + (-B5) * B1. */,
++ 0xD5DA /* A6 + (-B6) * B1. */,
++ 0x7C00 /* A7 + (-B7) * B1. */ };
++
++VECT_VAR_DECL (expected2_fms_laneq_static, hfloat, 16, 8) []
++ = { 0xD54E /* A0 + (-B0) * B2. */,
++ 0x51BA /* A1 + (-B1) * B2. */,
++ 0xD4F3 /* A2 + (-B2) * B2. */,
++ 0xCE66 /* A3 + (-B3) * B2. */,
++ 0x7C00 /* A4 + (-B4) * B2. */,
++ 0x6CC8 /* A5 + (-B5) * B2. */,
++ 0x5DD7 /* A6 + (-B6) * B2. */,
++ 0xFC00 /* A7 + (-B7) * B2. */ };
++
++VECT_VAR_DECL (expected3_fms_laneq_static, hfloat, 16, 8) []
++ = { 0x4F70 /* A0 + (-B0) * B3. */,
++ 0x4C5A /* A1 + (-B1) * B3. */,
++ 0xD073 /* A2 + (-B2) * B3. */,
++ 0xC600 /* A3 + (-B3) * B3. */,
++ 0x7C00 /* A4 + (-B4) * B3. */,
++ 0x684B /* A5 + (-B5) * B3. */,
++ 0x5AD0 /* A6 + (-B6) * B3. */,
++ 0xFC00 /* A7 + (-B7) * B3. */ };
++
++VECT_VAR_DECL (expected4_fms_laneq_static, hfloat, 16, 8) []
++ = { 0x5179 /* A0 + (-B0) * B4. */,
++ 0x4AF6 /* A1 + (-B1) * B4. */,
++ 0xCF91 /* A2 + (-B2) * B4. */,
++ 0xC334 /* A3 + (-B3) * B4. */,
++ 0x7C00 /* A4 + (-B4) * B4. */,
++ 0x674C /* A5 + (-B5) * B4. */,
++ 0x5A37 /* A6 + (-B6) * B4. */,
++ 0xFC00 /* A7 + (-B7) * B4. */ };
++
++VECT_VAR_DECL (expected5_fms_laneq_static, hfloat, 16, 8) []
++ = { 0x725C /* A0 + (-B0) * B5. */,
++ 0xEA41 /* A1 + (-B1) * B5. */,
++ 0x6CCA /* A2 + (-B2) * B5. */,
++ 0x6853 /* A3 + (-B3) * B5. */,
++ 0x7C00 /* A4 + (-B4) * B5. */,
++ 0xFC00 /* A5 + (-B5) * B5. */,
++ 0xF441 /* A6 + (-B6) * B5. */,
++ 0x7C00 /* A7 + (-B7) * B5. */ };
++
++VECT_VAR_DECL (expected6_fms_laneq_static, hfloat, 16, 8) []
++ = { 0x62C7 /* A0 + (-B0) * B6. */,
++ 0xD9F2 /* A1 + (-B1) * B6. */,
++ 0x5C6C /* A2 + (-B2) * B6. */,
++ 0x584A /* A3 + (-B3) * B6. */,
++ 0x7C00 /* A4 + (-B4) * B6. */,
++ 0xF447 /* A5 + (-B5) * B6. */,
++ 0xE330 /* A6 + (-B6) * B6. */,
++ 0x7C00 /* A7 + (-B7) * B6. */ };
++
++VECT_VAR_DECL (expected7_fms_laneq_static, hfloat, 16, 8) []
++ = { 0xFC00 /* A0 + (-B0) * B7. */,
++ 0x7C00 /* A1 + (-B1) * B7. */,
++ 0xFC00 /* A2 + (-B2) * B7. */,
++ 0xFC00 /* A3 + (-B3) * B7. */,
++ 0x7C00 /* A4 + (-B4) * B7. */,
++ 0x7C00 /* A5 + (-B5) * B7. */,
++ 0x7C00 /* A6 + (-B6) * B7. */,
++ 0xFC00 /* A7 + (-B7) * B7. */ };
++
++void exec_vfmas_lane_f16 (void)
+{
-+ return __a * __b;
-+}
++#undef TEST_MSG
++#define TEST_MSG "VFMA_LANE (FP16)"
++ clean_results ();
+
-+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-+vmulq_n_u16 (uint16x8_t __a, uint16_t __b)
-+{
-+ return __a * __b;
-+}
++ DECL_VARIABLE(vsrc_1, float, 16, 4);
++ DECL_VARIABLE(vsrc_2, float, 16, 4);
++ VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A0, A1, A2, A3};
++ VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {B0, B1, B2, B3};
++ VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
++ VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
++ DECL_VARIABLE (vector_res, float, 16, 4)
++ = vfma_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 0);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 1);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 2);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 3);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_static, "");
+
-+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-+vmul_n_u32 (uint32x2_t __a, uint32_t __b)
-+{
-+ return __a * __b;
-+}
++#undef TEST_MSG
++#define TEST_MSG "VFMAQ_LANE (FP16)"
++ clean_results ();
+
-+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-+vmulq_n_u32 (uint32x4_t __a, uint32_t __b)
-+{
-+ return __a * __b;
-+}
++ DECL_VARIABLE(vsrc_1, float, 16, 8);
++ DECL_VARIABLE(vsrc_2, float, 16, 8);
++ VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A0, A1, A2, A3, A4, A5, A6, A7};
++ VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {B0, B1, B2, B3, B4, B5, B6, B7};
++ VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
++ VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
++ DECL_VARIABLE (vector_res, float, 16, 8)
++ = vfmaq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 0);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 1);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 2);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 3);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_static, "");
+
-+/* vmvn */
++#undef TEST_MSG
++#define TEST_MSG "VFMA_LANEQ (FP16)"
++ clean_results ();
+
-+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-+vmvn_p8 (poly8x8_t __a)
-+{
-+ return (poly8x8_t) ~((int8x8_t) __a);
-+}
++ DECL_VARIABLE(vsrc_3, float, 16, 8);
++ VECT_VAR_DECL (buf_src_3, float, 16, 8) [] = {B0, B1, B2, B3, B4, B5, B6, B7};
++ VLOAD (vsrc_3, buf_src_3, q, float, f, 16, 8);
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 0);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 1);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 2);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 3);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 4);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected4_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 5);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected5_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 6);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected6_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 7);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected7_laneq_static, "");
+
-+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-+vmvn_s8 (int8x8_t __a)
-+{
-+ return ~__a;
-+}
++#undef TEST_MSG
++#define TEST_MSG "VFMAQ_LANEQ (FP16)"
++ clean_results ();
+
-+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-+vmvn_s16 (int16x4_t __a)
-+{
-+ return ~__a;
-+}
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 0);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 1);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 2);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 3);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 4);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected4_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 5);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected5_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 6);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected6_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 7);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected7_laneq_static, "");
+
-+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-+vmvn_s32 (int32x2_t __a)
-+{
-+ return ~__a;
-+}
++#undef TEST_MSG
++#define TEST_MSG "VFMS_LANE (FP16)"
++ clean_results ();
+
-+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-+vmvn_u8 (uint8x8_t __a)
-+{
-+ return ~__a;
-+}
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 0);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
+
-+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-+vmvn_u16 (uint16x4_t __a)
-+{
-+ return ~__a;
-+}
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_fms_static, "");
+
-+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-+vmvn_u32 (uint32x2_t __a)
-+{
-+ return ~__a;
-+}
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 1);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
+
-+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-+vmvnq_p8 (poly8x16_t __a)
-+{
-+ return (poly8x16_t) ~((int8x16_t) __a);
-+}
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_fms_static, "");
+
-+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-+vmvnq_s8 (int8x16_t __a)
-+{
-+ return ~__a;
-+}
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 2);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
+
-+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-+vmvnq_s16 (int16x8_t __a)
-+{
-+ return ~__a;
-+}
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_fms_static, "");
+
-+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-+vmvnq_s32 (int32x4_t __a)
-+{
-+ return ~__a;
-+}
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 3);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
+
-+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-+vmvnq_u8 (uint8x16_t __a)
-+{
-+ return ~__a;
-+}
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_fms_static, "");
+
-+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-+vmvnq_u16 (uint16x8_t __a)
-+{
-+ return ~__a;
-+}
++#undef TEST_MSG
++#define TEST_MSG "VFMSQ_LANE (FP16)"
++ clean_results ();
+
-+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-+vmvnq_u32 (uint32x4_t __a)
-+{
-+ return ~__a;
-+}
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 0);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
- /* vneg */
-
- __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-@@ -18971,6 +18581,24 @@ vnegq_s64 (int64x2_t __a)
-
- /* vpadd */
-
-+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-+vpadd_f32 (float32x2_t __a, float32x2_t __b)
-+{
-+ return __builtin_aarch64_faddpv2sf (__a, __b);
-+}
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_fms_static, "");
+
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vpaddq_f32 (float32x4_t __a, float32x4_t __b)
-+{
-+ return __builtin_aarch64_faddpv4sf (__a, __b);
-+}
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 1);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
-+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-+vpaddq_f64 (float64x2_t __a, float64x2_t __b)
-+{
-+ return __builtin_aarch64_faddpv2df (__a, __b);
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_fms_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 2);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_fms_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 3);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_fms_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VFMS_LANEQ (FP16)"
++ clean_results ();
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 0);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 1);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 2);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 3);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 4);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected4_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 5);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected5_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 6);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected6_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4),
++ VECT_VAR (vsrc_3, float, 16, 8), 7);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected7_fms_laneq_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VFMSQ_LANEQ (FP16)"
++ clean_results ();
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 0);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 1);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 2);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 3);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 4);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected4_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 5);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected5_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 6);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected6_fms_laneq_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8),
++ VECT_VAR (vsrc_3, float, 16, 8), 7);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected7_fms_laneq_static, "");
+}
+
- __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
- vpadd_s8 (int8x8_t __a, int8x8_t __b)
- {
-@@ -19010,6 +18638,12 @@ vpadd_u32 (uint32x2_t __a, uint32x2_t __b)
- (int32x2_t) __b);
- }
-
-+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-+vpadds_f32 (float32x2_t __a)
++int
++main (void)
+{
-+ return __builtin_aarch64_reduc_plus_scal_v2sf (__a);
++ exec_vfmas_lane_f16 ();
++ return 0;
+}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmas_n_f16_1.c
+@@ -0,0 +1,469 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
- __extension__ static __inline float64_t __attribute__ ((__always_inline__))
- vpaddd_f64 (float64x2_t __a)
- {
-@@ -21713,6 +21347,83 @@ vrshrd_n_u64 (uint64_t __a, const int __b)
- return __builtin_aarch64_urshr_ndi_uus (__a, __b);
- }
-
-+/* vrsqrte. */
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-+vrsqrtes_f32 (float32_t __a)
++#define FP16_C(a) ((__fp16) a)
++#define A0 FP16_C (123.4)
++#define A1 FP16_C (-5.8)
++#define A2 FP16_C (-0.0)
++#define A3 FP16_C (10)
++#define A4 FP16_C (123412.43)
++#define A5 FP16_C (-5.8)
++#define A6 FP16_C (90.8)
++#define A7 FP16_C (24)
++
++#define B0 FP16_C (23.4)
++#define B1 FP16_C (-5.8)
++#define B2 FP16_C (8.9)
++#define B3 FP16_C (4.0)
++#define B4 FP16_C (3.4)
++#define B5 FP16_C (-550.8)
++#define B6 FP16_C (-31.8)
++#define B7 FP16_C (20000.0)
++
++/* Expected results for vfma_n. */
++VECT_VAR_DECL (expected_fma0_static, hfloat, 16, 4) []
++ = { 0x613E /* A0 + B0 * B0. */,
++ 0xD86D /* A1 + B1 * B0. */,
++ 0x5A82 /* A2 + B2 * B0. */,
++ 0x567A /* A3 + B3 * B0. */ };
++
++VECT_VAR_DECL (expected_fma1_static, hfloat, 16, 4) []
++ = { 0xCA33 /* A0 + B0 * B1. */,
++ 0x4EF6 /* A1 + B1 * B1. */,
++ 0xD274 /* A2 + B2 * B1. */,
++ 0xCA9A /* A3 + B3 * B1. */ };
++
++VECT_VAR_DECL (expected_fma2_static, hfloat, 16, 4) []
++ = { 0x5D2F /* A0 + B0 * B2. */,
++ 0xD32D /* A1 + B1 * B2. */,
++ 0x54F3 /* A2 + B2 * B2. */,
++ 0x51B3 /* A3 + B3 * B2. */ };
++
++VECT_VAR_DECL (expected_fma3_static, hfloat, 16, 4) []
++ = { 0x5AC8 /* A0 + B0 * B3. */,
++ 0xCF40 /* A1 + B1 * B3. */,
++ 0x5073 /* A2 + B2 * B3. */,
++ 0x4E80 /* A3 + B3 * B3. */ };
++
++VECT_VAR_DECL (expected_fma0_static, hfloat, 16, 8) []
++ = { 0x613E /* A0 + B0 * B0. */,
++ 0xD86D /* A1 + B1 * B0. */,
++ 0x5A82 /* A2 + B2 * B0. */,
++ 0x567A /* A3 + B3 * B0. */,
++ 0x7C00 /* A4 + B4 * B0. */,
++ 0xF24D /* A5 + B5 * B0. */,
++ 0xE11B /* A6 + B6 * B0. */,
++ 0x7C00 /* A7 + B7 * B0. */ };
++
++VECT_VAR_DECL (expected_fma1_static, hfloat, 16, 8) []
++ = { 0xCA33 /* A0 + B0 * B1. */,
++ 0x4EF6 /* A1 + B1 * B1. */,
++ 0xD274 /* A2 + B2 * B1. */,
++ 0xCA9A /* A3 + B3 * B1. */,
++ 0x7C00 /* A4 + B4 * B1. */,
++ 0x6A3B /* A5 + B5 * B1. */,
++ 0x5C4D /* A6 + B6 * B1. */,
++ 0xFC00 /* A7 + B7 * B1. */ };
++
++VECT_VAR_DECL (expected_fma2_static, hfloat, 16, 8) []
++ = { 0x5D2F /* A0 + B0 * B2. */,
++ 0xD32D /* A1 + B1 * B2. */,
++ 0x54F3 /* A2 + B2 * B2. */,
++ 0x51B3 /* A3 + B3 * B2. */,
++ 0x7C00 /* A4 + B4 * B2. */,
++ 0xECCB /* A5 + B5 * B2. */,
++ 0xDA01 /* A6 + B6 * B2. */,
++ 0x7C00 /* A7 + B7 * B2. */ };
++
++VECT_VAR_DECL (expected_fma3_static, hfloat, 16, 8) []
++ = { 0x5AC8 /* A0 + B0 * B3. */,
++ 0xCF40 /* A1 + B1 * B3. */,
++ 0x5073 /* A2 + B2 * B3. */,
++ 0x4E80 /* A3 + B3 * B3. */,
++ 0x7C00 /* A4 + B4 * B3. */,
++ 0xE851 /* A5 + B5 * B3. */,
++ 0xD08C /* A6 + B6 * B3. */,
++ 0x7C00 /* A7 + B7 * B3. */ };
++
++VECT_VAR_DECL (expected_fma4_static, hfloat, 16, 8) []
++ = { 0x5A58 /* A0 + B0 * B4. */,
++ 0xCE62 /* A1 + B1 * B4. */,
++ 0x4F91 /* A2 + B2 * B4. */,
++ 0x4DE6 /* A3 + B3 * B4. */,
++ 0x7C00 /* A4 + B4 * B4. */,
++ 0xE757 /* A5 + B5 * B4. */,
++ 0xCC54 /* A6 + B6 * B4. */,
++ 0x7C00 /* A7 + B7 * B4. */ };
++
++VECT_VAR_DECL (expected_fma5_static, hfloat, 16, 8) []
++ = { 0xF23D /* A0 + B0 * B5. */,
++ 0x6A3B /* A1 + B1 * B5. */,
++ 0xECCA /* A2 + B2 * B5. */,
++ 0xE849 /* A3 + B3 * B5. */,
++ 0x7C00 /* A4 + B4 * B5. */,
++ 0x7C00 /* A5 + B5 * B5. */,
++ 0x744D /* A6 + B6 * B5. */,
++ 0xFC00 /* A7 + B7 * B5. */ };
++
++VECT_VAR_DECL (expected_fma6_static, hfloat, 16, 8) []
++ = { 0xE0DA /* A0 + B0 * B6. */,
++ 0x5995 /* A1 + B1 * B6. */,
++ 0xDC6C /* A2 + B2 * B6. */,
++ 0xD753 /* A3 + B3 * B6. */,
++ 0x7C00 /* A4 + B4 * B6. */,
++ 0x7447 /* A5 + B5 * B6. */,
++ 0x644E /* A6 + B6 * B6. */,
++ 0xFC00 /* A7 + B7 * B6. */ };
++
++VECT_VAR_DECL (expected_fma7_static, hfloat, 16, 8) []
++ = { 0x7C00 /* A0 + B0 * B7. */,
++ 0xFC00 /* A1 + B1 * B7. */,
++ 0x7C00 /* A2 + B2 * B7. */,
++ 0x7C00 /* A3 + B3 * B7. */,
++ 0x7C00 /* A4 + B4 * B7. */,
++ 0xFC00 /* A5 + B5 * B7. */,
++ 0xFC00 /* A6 + B6 * B7. */,
++ 0x7C00 /* A7 + B7 * B7. */ };
++
++/* Expected results for vfms_n. */
++VECT_VAR_DECL (expected_fms0_static, hfloat, 16, 4) []
++ = { 0xDEA2 /* A0 + (-B0) * B0. */,
++ 0x5810 /* A1 + (-B1) * B0. */,
++ 0xDA82 /* A2 + (-B2) * B0. */,
++ 0xD53A /* A3 + (-B3) * B0. */ };
++
++VECT_VAR_DECL (expected_fms1_static, hfloat, 16, 4) []
++ = { 0x5C0D /* A0 + (-B0) * B1. */,
++ 0xD0EE /* A1 + (-B1) * B1. */,
++ 0x5274 /* A2 + (-B2) * B1. */,
++ 0x5026 /* A3 + (-B3) * B1. */ };
++
++VECT_VAR_DECL (expected_fms2_static, hfloat, 16, 4) []
++ = { 0xD54E /* A0 + (-B0) * B2. */,
++ 0x51BA /* A1 + (-B1) * B2. */,
++ 0xD4F3 /* A2 + (-B2) * B2. */,
++ 0xCE66 /* A3 + (-B3) * B2. */ };
++
++VECT_VAR_DECL (expected_fms3_static, hfloat, 16, 4) []
++ = { 0x4F70 /* A0 + (-B0) * B3. */,
++ 0x4C5A /* A1 + (-B1) * B3. */,
++ 0xD073 /* A2 + (-B2) * B3. */,
++ 0xC600 /* A3 + (-B3) * B3. */ };
++
++VECT_VAR_DECL (expected_fms0_static, hfloat, 16, 8) []
++ = { 0xDEA2 /* A0 + (-B0) * B0. */,
++ 0x5810 /* A1 + (-B1) * B0. */,
++ 0xDA82 /* A2 + (-B2) * B0. */,
++ 0xD53A /* A3 + (-B3) * B0. */,
++ 0x7C00 /* A4 + (-B4) * B0. */,
++ 0x724B /* A5 + (-B5) * B0. */,
++ 0x6286 /* A6 + (-B6) * B0. */,
++ 0xFC00 /* A7 + (-B7) * B0. */ };
++
++VECT_VAR_DECL (expected_fms1_static, hfloat, 16, 8) []
++ = { 0x5C0D /* A0 + (-B0) * B1. */,
++ 0xD0EE /* A1 + (-B1) * B1. */,
++ 0x5274 /* A2 + (-B2) * B1. */,
++ 0x5026 /* A3 + (-B3) * B1. */,
++ 0x7C00 /* A4 + (-B4) * B1. */,
++ 0xEA41 /* A5 + (-B5) * B1. */,
++ 0xD5DA /* A6 + (-B6) * B1. */,
++ 0x7C00 /* A7 + (-B7) * B1. */ };
++
++VECT_VAR_DECL (expected_fms2_static, hfloat, 16, 8) []
++ = { 0xD54E /* A0 + (-B0) * B2. */,
++ 0x51BA /* A1 + (-B1) * B2. */,
++ 0xD4F3 /* A2 + (-B2) * B2. */,
++ 0xCE66 /* A3 + (-B3) * B2. */,
++ 0x7C00 /* A4 + (-B4) * B2. */,
++ 0x6CC8 /* A5 + (-B5) * B2. */,
++ 0x5DD7 /* A6 + (-B6) * B2. */,
++ 0xFC00 /* A7 + (-B7) * B2. */ };
++
++VECT_VAR_DECL (expected_fms3_static, hfloat, 16, 8) []
++ = { 0x4F70 /* A0 + (-B0) * B3. */,
++ 0x4C5A /* A1 + (-B1) * B3. */,
++ 0xD073 /* A2 + (-B2) * B3. */,
++ 0xC600 /* A3 + (-B3) * B3. */,
++ 0x7C00 /* A4 + (-B4) * B3. */,
++ 0x684B /* A5 + (-B5) * B3. */,
++ 0x5AD0 /* A6 + (-B6) * B3. */,
++ 0xFC00 /* A7 + (-B7) * B3. */ };
++
++VECT_VAR_DECL (expected_fms4_static, hfloat, 16, 8) []
++ = { 0x5179 /* A0 + (-B0) * B4. */,
++ 0x4AF6 /* A1 + (-B1) * B4. */,
++ 0xCF91 /* A2 + (-B2) * B4. */,
++ 0xC334 /* A3 + (-B3) * B4. */,
++ 0x7C00 /* A4 + (-B4) * B4. */,
++ 0x674C /* A5 + (-B5) * B4. */,
++ 0x5A37 /* A6 + (-B6) * B4. */,
++ 0xFC00 /* A7 + (-B7) * B4. */ };
++
++VECT_VAR_DECL (expected_fms5_static, hfloat, 16, 8) []
++ = { 0x725C /* A0 + (-B0) * B5. */,
++ 0xEA41 /* A1 + (-B1) * B5. */,
++ 0x6CCA /* A2 + (-B2) * B5. */,
++ 0x6853 /* A3 + (-B3) * B5. */,
++ 0x7C00 /* A4 + (-B4) * B5. */,
++ 0xFC00 /* A5 + (-B5) * B5. */,
++ 0xF441 /* A6 + (-B6) * B5. */,
++ 0x7C00 /* A7 + (-B7) * B5. */ };
++
++VECT_VAR_DECL (expected_fms6_static, hfloat, 16, 8) []
++ = { 0x62C7 /* A0 + (-B0) * B6. */,
++ 0xD9F2 /* A1 + (-B1) * B6. */,
++ 0x5C6C /* A2 + (-B2) * B6. */,
++ 0x584A /* A3 + (-B3) * B6. */,
++ 0x7C00 /* A4 + (-B4) * B6. */,
++ 0xF447 /* A5 + (-B5) * B6. */,
++ 0xE330 /* A6 + (-B6) * B6. */,
++ 0x7C00 /* A7 + (-B7) * B6. */ };
++
++VECT_VAR_DECL (expected_fms7_static, hfloat, 16, 8) []
++ = { 0xFC00 /* A0 + (-B0) * B7. */,
++ 0x7C00 /* A1 + (-B1) * B7. */,
++ 0xFC00 /* A2 + (-B2) * B7. */,
++ 0xFC00 /* A3 + (-B3) * B7. */,
++ 0x7C00 /* A4 + (-B4) * B7. */,
++ 0x7C00 /* A5 + (-B5) * B7. */,
++ 0x7C00 /* A6 + (-B6) * B7. */,
++ 0xFC00 /* A7 + (-B7) * B7. */ };
++
++void exec_vfmas_n_f16 (void)
+{
-+ return __builtin_aarch64_rsqrtesf (__a);
-+}
++#undef TEST_MSG
++#define TEST_MSG "VFMA_N (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_1, float, 16, 4);
++ DECL_VARIABLE(vsrc_2, float, 16, 4);
++ VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A0, A1, A2, A3};
++ VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {B0, B1, B2, B3};
++ VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
++ VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
++ DECL_VARIABLE (vector_res, float, 16, 4)
++ = vfma_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), B0);
++
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fma0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), B1);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fma1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), B2);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fma2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfma_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), B3);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fma3_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VFMAQ_N (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_1, float, 16, 8);
++ DECL_VARIABLE(vsrc_2, float, 16, 8);
++ VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A0, A1, A2, A3, A4, A5, A6, A7};
++ VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {B0, B1, B2, B3, B4, B5, B6, B7};
++ VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
++ VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
++ DECL_VARIABLE (vector_res, float, 16, 8)
++ = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B0);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B1);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B2);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B3);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma3_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B4);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma4_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B5);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma5_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B6);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma6_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmaq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B7);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fma7_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VFMA_N (FP16)"
++ clean_results ();
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), B0);
++
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fms0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), B1);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fms1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), B2);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fms2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vfms_n_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), B3);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_fms3_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VFMAQ_N (FP16)"
++ clean_results ();
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B0);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B1);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B2);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B3);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
-+__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-+vrsqrted_f64 (float64_t __a)
-+{
-+ return __builtin_aarch64_rsqrtedf (__a);
-+}
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms3_static, "");
+
-+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-+vrsqrte_f32 (float32x2_t __a)
-+{
-+ return __builtin_aarch64_rsqrtev2sf (__a);
-+}
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B4);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
-+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
-+vrsqrte_f64 (float64x1_t __a)
-+{
-+ return (float64x1_t) {vrsqrted_f64 (vget_lane_f64 (__a, 0))};
-+}
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms4_static, "");
+
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vrsqrteq_f32 (float32x4_t __a)
-+{
-+ return __builtin_aarch64_rsqrtev4sf (__a);
-+}
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B5);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
-+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-+vrsqrteq_f64 (float64x2_t __a)
-+{
-+ return __builtin_aarch64_rsqrtev2df (__a);
-+}
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms5_static, "");
+
-+/* vrsqrts. */
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B6);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
-+__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-+vrsqrtss_f32 (float32_t __a, float32_t __b)
-+{
-+ return __builtin_aarch64_rsqrtssf (__a, __b);
-+}
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms6_static, "");
+
-+__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-+vrsqrtsd_f64 (float64_t __a, float64_t __b)
-+{
-+ return __builtin_aarch64_rsqrtsdf (__a, __b);
-+}
++ VECT_VAR (vector_res, float, 16, 8)
++ = vfmsq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), B7);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
-+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-+vrsqrts_f32 (float32x2_t __a, float32x2_t __b)
-+{
-+ return __builtin_aarch64_rsqrtsv2sf (__a, __b);
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_fms7_static, "");
+}
+
-+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
-+vrsqrts_f64 (float64x1_t __a, float64x1_t __b)
++int
++main (void)
+{
-+ return (float64x1_t) {vrsqrtsd_f64 (vget_lane_f64 (__a, 0),
-+ vget_lane_f64 (__b, 0))};
++ exec_vfmas_n_f16 ();
++ return 0;
+}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmash_lane_f16_1.c
+@@ -0,0 +1,143 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-+vrsqrtsq_f32 (float32x4_t __a, float32x4_t __b)
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++#define FP16_C(a) ((__fp16) a)
++#define A0 FP16_C (123.4)
++#define B0 FP16_C (-5.8)
++#define C0 FP16_C (-3.8)
++#define D0 FP16_C (10)
++
++#define A1 FP16_C (12.4)
++#define B1 FP16_C (-5.8)
++#define C1 FP16_C (90.8)
++#define D1 FP16_C (24)
++
++#define A2 FP16_C (23.4)
++#define B2 FP16_C (-5.8)
++#define C2 FP16_C (8.9)
++#define D2 FP16_C (4)
++
++#define E0 FP16_C (3.4)
++#define F0 FP16_C (-55.8)
++#define G0 FP16_C (-31.8)
++#define H0 FP16_C (2)
++
++#define E1 FP16_C (123.4)
++#define F1 FP16_C (-5.8)
++#define G1 FP16_C (-3.8)
++#define H1 FP16_C (102)
++
++#define E2 FP16_C (4.9)
++#define F2 FP16_C (-15.8)
++#define G2 FP16_C (39.8)
++#define H2 FP16_C (49)
++
++extern void abort ();
++
++float16_t src1[8] = { A0, B0, C0, D0, E0, F0, G0, H0 };
++float16_t src2[8] = { A1, B1, C1, D1, E1, F1, G1, H1 };
++VECT_VAR_DECL (src3, float, 16, 4) [] = { A2, B2, C2, D2 };
++VECT_VAR_DECL (src3, float, 16, 8) [] = { A2, B2, C2, D2, E2, F2, G2, H2 };
++
++/* Expected results for vfmah_lane_f16. */
++uint16_t expected[4] = { 0x5E76 /* A0 + A1 * A2. */,
++ 0x4EF6 /* B0 + B1 * B2. */,
++ 0x6249 /* C0 + C1 * C2. */,
++ 0x56A0 /* D0 + D1 * D2. */ };
++
++/* Expected results for vfmah_laneq_f16. */
++uint16_t expected_laneq[8] = { 0x5E76 /* A0 + A1 * A2. */,
++ 0x4EF6 /* B0 + B1 * B2. */,
++ 0x6249 /* C0 + C1 * C2. */,
++ 0x56A0 /* D0 + D1 * D2. */,
++ 0x60BF /* E0 + E1 * E2. */,
++ 0x507A /* F0 + F1 * F2. */,
++ 0xD9B9 /* G0 + G1 * G2. */,
++ 0x6CE2 /* H0 + H1 * H2. */ };
++
++/* Expected results for vfmsh_lane_f16. */
++uint16_t expected_fms[4] = { 0xD937 /* A0 + -A1 * A2. */,
++ 0xD0EE /* B0 + -B1 * B2. */,
++ 0xE258 /* C0 + -C1 * C2. */,
++ 0xD560 /* D0 + -D1 * D2. */ };
++
++/* Expected results for vfmsh_laneq_f16. */
++uint16_t expected_fms_laneq[8] = { 0xD937 /* A0 + -A1 * A2. */,
++ 0xD0EE /* B0 + -B1 * B2. */,
++ 0xE258 /* C0 + -C1 * C2. */,
++ 0xD560 /* D0 + -D1 * D2. */,
++ 0xE0B2 /* E0 + -E1 * E2. */,
++ 0xD89C /* F0 + -F1 * F2. */,
++ 0x5778 /* G0 + -G1 * G2. */,
++ 0xECE1 /* H0 + -H1 * H2. */ };
++
++void exec_vfmash_lane_f16 (void)
+{
-+ return __builtin_aarch64_rsqrtsv4sf (__a, __b);
++#define CHECK_LANE(N) \
++ ret = vfmah_lane_f16 (src1[N], src2[N], VECT_VAR (vsrc3, float, 16, 4), N);\
++ if (*(uint16_t *) &ret != expected[N])\
++ abort ();
++
++ DECL_VARIABLE(vsrc3, float, 16, 4);
++ VLOAD (vsrc3, src3, , float, f, 16, 4);
++ float16_t ret;
++ CHECK_LANE(0)
++ CHECK_LANE(1)
++ CHECK_LANE(2)
++ CHECK_LANE(3)
++
++#undef CHECK_LANE
++#define CHECK_LANE(N) \
++ ret = vfmah_laneq_f16 (src1[N], src2[N], VECT_VAR (vsrc3, float, 16, 8), N);\
++ if (*(uint16_t *) &ret != expected_laneq[N]) \
++ abort ();
++
++ DECL_VARIABLE(vsrc3, float, 16, 8);
++ VLOAD (vsrc3, src3, q, float, f, 16, 8);
++ CHECK_LANE(0)
++ CHECK_LANE(1)
++ CHECK_LANE(2)
++ CHECK_LANE(3)
++ CHECK_LANE(4)
++ CHECK_LANE(5)
++ CHECK_LANE(6)
++ CHECK_LANE(7)
++
++#undef CHECK_LANE
++#define CHECK_LANE(N) \
++ ret = vfmsh_lane_f16 (src1[N], src2[N], VECT_VAR (vsrc3, float, 16, 4), N);\
++ if (*(uint16_t *) &ret != expected_fms[N])\
++ abort ();
++
++ CHECK_LANE(0)
++ CHECK_LANE(1)
++ CHECK_LANE(2)
++
++#undef CHECK_LANE
++#define CHECK_LANE(N) \
++ ret = vfmsh_laneq_f16 (src1[N], src2[N], VECT_VAR (vsrc3, float, 16, 8), N);\
++ if (*(uint16_t *) &ret != expected_fms_laneq[N]) \
++ abort ();
++
++ CHECK_LANE(0)
++ CHECK_LANE(1)
++ CHECK_LANE(2)
++ CHECK_LANE(3)
++ CHECK_LANE(4)
++ CHECK_LANE(5)
++ CHECK_LANE(6)
++ CHECK_LANE(7)
+}
+
-+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-+vrsqrtsq_f64 (float64x2_t __a, float64x2_t __b)
++int
++main (void)
+{
-+ return __builtin_aarch64_rsqrtsv2df (__a, __b);
++ exec_vfmash_lane_f16 ();
++ return 0;
+}
-+
- /* vrsra */
-
- __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
---- a/src/gcc/config/aarch64/iterators.md
-+++ b/src/gcc/config/aarch64/iterators.md
-@@ -154,6 +154,12 @@
- ;; Vector modes for S type.
- (define_mode_iterator VDQ_SI [V2SI V4SI])
-
-+;; Vector modes for S and D
-+(define_mode_iterator VDQ_SDI [V2SI V4SI V2DI])
-+
-+;; Scalar and Vector modes for S and D
-+(define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI])
-+
- ;; Vector modes for Q and H types.
- (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
-
-@@ -648,8 +654,13 @@
- (define_mode_attr atomic_sfx
- [(QI "b") (HI "h") (SI "") (DI "")])
-
--(define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si") (SF "si") (DF "di")])
--(define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI") (SF "SI") (DF "DI")])
-+(define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si")
-+ (V2DI "v2df") (V4SI "v4sf") (V2SI "v2sf")
-+ (SF "si") (DF "di") (SI "sf") (DI "df")])
-+(define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI")
-+ (V2DI "V2DF") (V4SI "V4SF") (V2SI "V2SF")
-+ (SF "SI") (DF "DI") (SI "SF") (DI "DF")])
-+
-
- ;; for the inequal width integer to fp conversions
- (define_mode_attr fcvt_iesize [(SF "di") (DF "si")])
-@@ -715,6 +726,7 @@
- (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
- (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms.c
+@@ -4,10 +4,17 @@
-+;; Sum of lengths of instructions needed to move vector registers of a mode.
- (define_mode_attr insn_count [(OI "8") (CI "12") (XI "16")])
+ #ifdef __ARM_FEATURE_FMA
+ /* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xe206, 0xe204, 0xe202, 0xe200 };
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xe455, 0xe454, 0xe453, 0xe452,
++ 0xe451, 0xe450, 0xe44f, 0xe44e };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc440ca3d, 0xc4408a3d };
+-VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc48a9eb8, 0xc48a7eb8, 0xc48a5eb8, 0xc48a3eb8 };
++VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc48a9eb8, 0xc48a7eb8,
++ 0xc48a5eb8, 0xc48a3eb8 };
+ #ifdef __aarch64__
+-VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0xc08a06e1532b8520, 0xc089fee1532b8520 };
++VECT_VAR_DECL(expected,hfloat,64,2) [] = { 0xc08a06e1532b8520,
++ 0xc089fee1532b8520 };
+ #endif
- ;; -fpic small model GOT reloc modifers: gotpage_lo15/lo14 for ILP64/32.
-@@ -1001,6 +1013,9 @@
- (define_int_iterator FCVT [UNSPEC_FRINTZ UNSPEC_FRINTP UNSPEC_FRINTM
- UNSPEC_FRINTA UNSPEC_FRINTN])
+ #define TEST_MSG "VFMS/VFMSQ"
+@@ -44,6 +51,18 @@ void exec_vfms (void)
+ DECL_VARIABLE(VAR, float, 32, 4);
+ #endif
-+(define_int_iterator FCVT_F2FIXED [UNSPEC_FCVTZS UNSPEC_FCVTZU])
-+(define_int_iterator FCVT_FIXED2F [UNSPEC_SCVTF UNSPEC_UCVTF])
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector1, float, 16, 4);
++ DECL_VARIABLE(vector2, float, 16, 4);
++ DECL_VARIABLE(vector3, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 4);
++
++ DECL_VARIABLE(vector1, float, 16, 8);
++ DECL_VARIABLE(vector2, float, 16, 8);
++ DECL_VARIABLE(vector3, float, 16, 8);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
+
- (define_int_iterator FRECP [UNSPEC_FRECPE UNSPEC_FRECPX])
+ DECL_VFMS_VAR(vector1);
+ DECL_VFMS_VAR(vector2);
+ DECL_VFMS_VAR(vector3);
+@@ -52,6 +71,10 @@ void exec_vfms (void)
+ clean_results ();
- (define_int_iterator CRC [UNSPEC_CRC32B UNSPEC_CRC32H UNSPEC_CRC32W
-@@ -1137,6 +1152,11 @@
- (UNSPEC_FRINTP "ceil") (UNSPEC_FRINTM "floor")
- (UNSPEC_FRINTN "frintn")])
+ /* Initialize input "vector1" from "buffer". */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector1, buffer, , float, f, 16, 4);
++ VLOAD(vector1, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector1, buffer, , float, f, 32, 2);
+ VLOAD(vector1, buffer, q, float, f, 32, 4);
+ #ifdef __aarch64__
+@@ -59,13 +82,21 @@ void exec_vfms (void)
+ #endif
-+(define_int_attr fcvt_fixed_insn [(UNSPEC_SCVTF "scvtf")
-+ (UNSPEC_UCVTF "ucvtf")
-+ (UNSPEC_FCVTZS "fcvtzs")
-+ (UNSPEC_FCVTZU "fcvtzu")])
+ /* Choose init value arbitrarily. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector2, , float, f, 16, 4, 9.3f);
++ VDUP(vector2, q, float, f, 16, 8, 29.7f);
++#endif
+ VDUP(vector2, , float, f, 32, 2, 9.3f);
+ VDUP(vector2, q, float, f, 32, 4, 29.7f);
+ #ifdef __aarch64__
+ VDUP(vector2, q, float, f, 64, 2, 15.8f);
+ #endif
+-
+
- (define_int_attr perm_insn [(UNSPEC_ZIP1 "zip") (UNSPEC_ZIP2 "zip")
- (UNSPEC_TRN1 "trn") (UNSPEC_TRN2 "trn")
- (UNSPEC_UZP1 "uzp") (UNSPEC_UZP2 "uzp")])
---- a/src/gcc/config/arm/arm-protos.h
-+++ b/src/gcc/config/arm/arm-protos.h
-@@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool initialize_p
- ATTRIBUTE_UNUSED);
- extern void arm_init_builtins (void);
- extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update);
--
-+extern rtx arm_simd_vect_par_cnst_half (machine_mode mode, bool high);
-+extern bool arm_simd_check_vect_par_cnst_half_p (rtx op, machine_mode mode,
-+ bool high);
- #ifdef RTX_CODE
- extern bool arm_vector_mode_supported_p (machine_mode);
- extern bool arm_small_register_classes_for_mode_p (machine_mode);
-@@ -319,6 +321,7 @@ extern int vfp3_const_double_for_bits (rtx);
+ /* Choose init value arbitrarily. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector3, , float, f, 16, 4, 81.2f);
++ VDUP(vector3, q, float, f, 16, 8, 36.8f);
++#endif
+ VDUP(vector3, , float, f, 32, 2, 81.2f);
+ VDUP(vector3, q, float, f, 32, 4, 36.8f);
+ #ifdef __aarch64__
+@@ -73,12 +104,20 @@ void exec_vfms (void)
+ #endif
- extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
- rtx);
-+extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
- extern bool arm_valid_symbolic_address_p (rtx);
- extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
- #endif /* RTX_CODE */
-@@ -601,6 +604,9 @@ extern int arm_tune_cortex_a9;
- interworking clean. */
- extern int arm_cpp_interwork;
+ /* Execute the tests. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VFMS(, float, f, 16, 4);
++ TEST_VFMS(q, float, f, 16, 8);
++#endif
+ TEST_VFMS(, float, f, 32, 2);
+ TEST_VFMS(q, float, f, 32, 4);
+ #ifdef __aarch64__
+ TEST_VFMS(q, float, f, 64, 2);
+ #endif
-+/* Nonzero if chip supports Thumb 1. */
-+extern int arm_arch_thumb1;
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
++#endif
+ CHECK_VFMS_RESULTS (TEST_MSG, "");
+ }
+ #endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_vfma_n.c
+@@ -0,0 +1,490 @@
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
- /* Nonzero if chip supports Thumb 2. */
- extern int arm_arch_thumb2;
-
---- a/src/gcc/config/arm/arm.c
-+++ b/src/gcc/config/arm/arm.c
-@@ -852,6 +852,9 @@ int arm_tune_cortex_a9 = 0;
- interworking clean. */
- int arm_cpp_interwork = 0;
-
-+/* Nonzero if chip supports Thumb 1. */
-+int arm_arch_thumb1;
++#if defined(__aarch64__) && defined(__ARM_FEATURE_FMA)
+
- /* Nonzero if chip supports Thumb 2. */
- int arm_arch_thumb2;
-
-@@ -3170,6 +3173,7 @@ arm_option_override (void)
- arm_arch7em = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH7EM);
- arm_arch8 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH8);
- arm_arch8_1 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_1);
-+ arm_arch_thumb1 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB);
- arm_arch_thumb2 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB2);
- arm_arch_xscale = ARM_FSET_HAS_CPU1 (insn_flags, FL_XSCALE);
-
-@@ -10759,8 +10763,6 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
- if ((arm_arch4 || GET_MODE (XEXP (x, 0)) == SImode)
- && MEM_P (XEXP (x, 0)))
- {
-- *cost = rtx_cost (XEXP (x, 0), VOIDmode, code, 0, speed_p);
--
- if (mode == DImode)
- *cost += COSTS_N_INSNS (1);
-
-@@ -15981,14 +15983,17 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
- /* If the same input register is used in both stores
- when storing different constants, try to find a free register.
- For example, the code
-- mov r0, 0
-- str r0, [r2]
-- mov r0, 1
-- str r0, [r2, #4]
-+ mov r0, 0
-+ str r0, [r2]
-+ mov r0, 1
-+ str r0, [r2, #4]
- can be transformed into
-- mov r1, 0
-- strd r1, r0, [r2]
-- in Thumb mode assuming that r1 is free. */
-+ mov r1, 0
-+ mov r0, 1
-+ strd r1, r0, [r2]
-+ in Thumb mode assuming that r1 is free.
-+ For ARM mode do the same but only if the starting register
-+ can be made to be even. */
- if (const_store
- && REGNO (operands[0]) == REGNO (operands[1])
- && INTVAL (operands[4]) != INTVAL (operands[5]))
-@@ -16007,7 +16012,6 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
- }
- else if (TARGET_ARM)
- {
-- return false;
- int regno = REGNO (operands[0]);
- if (!peep2_reg_dead_p (4, operands[0]))
- {
-@@ -29801,6 +29805,13 @@ aarch_macro_fusion_pair_p (rtx_insn* prev, rtx_insn* curr)
- return false;
- }
-
-+/* Return true iff the instruction fusion described by OP is enabled. */
-+bool
-+arm_fusion_enabled_p (tune_params::fuse_ops op)
-+{
-+ return current_tune->fusible_ops & op;
-+}
++#define A0 123.4f
++#define A1 -3.8f
++#define A2 -29.4f
++#define A3 (__builtin_inff ())
++#define A4 0.0f
++#define A5 24.0f
++#define A6 124.0f
++#define A7 1024.0f
+
- /* Implement the TARGET_ASAN_SHADOW_OFFSET hook. */
-
- static unsigned HOST_WIDE_INT
-@@ -30311,4 +30322,80 @@ arm_sched_fusion_priority (rtx_insn *insn, int max_pri,
- return;
- }
-
++#define B0 -5.8f
++#define B1 -0.0f
++#define B2 -10.8f
++#define B3 10.0f
++#define B4 23.4f
++#define B5 -1234.8f
++#define B6 8.9f
++#define B7 4.0f
+
-+/* Construct and return a PARALLEL RTX vector with elements numbering the
-+ lanes of either the high (HIGH == TRUE) or low (HIGH == FALSE) half of
-+ the vector - from the perspective of the architecture. This does not
-+ line up with GCC's perspective on lane numbers, so we end up with
-+ different masks depending on our target endian-ness. The diagram
-+ below may help. We must draw the distinction when building masks
-+ which select one half of the vector. An instruction selecting
-+ architectural low-lanes for a big-endian target, must be described using
-+ a mask selecting GCC high-lanes.
++#define E0 9.8f
++#define E1 -1024.0f
++#define E2 (-__builtin_inff ())
++#define E3 479.0f
++float32_t elem0 = E0;
++float32_t elem1 = E1;
++float32_t elem2 = E2;
++float32_t elem3 = E3;
+
-+ Big-Endian Little-Endian
++#define DA0 1231234.4
++#define DA1 -3.8
++#define DA2 -2980.4
++#define DA3 -5.8
++#define DA4 0.01123
++#define DA5 24.0
++#define DA6 124.12345
++#define DA7 1024.0
+
-+GCC 0 1 2 3 3 2 1 0
-+ | x | x | x | x | | x | x | x | x |
-+Architecture 3 2 1 0 3 2 1 0
++#define DB0 -5.8
++#define DB1 (__builtin_inf ())
++#define DB2 -105.8
++#define DB3 10.0
++#define DB4 (-__builtin_inf ())
++#define DB5 -1234.8
++#define DB6 848.9
++#define DB7 44444.0
+
-+Low Mask: { 2, 3 } { 0, 1 }
-+High Mask: { 0, 1 } { 2, 3 }
-+*/
++#define DE0 9.8
++#define DE1 -1024.0
++#define DE2 105.8
++#define DE3 479.0
++float64_t delem0 = DE0;
++float64_t delem1 = DE1;
++float64_t delem2 = DE2;
++float64_t delem3 = DE3;
+
-+rtx
-+arm_simd_vect_par_cnst_half (machine_mode mode, bool high)
-+{
-+ int nunits = GET_MODE_NUNITS (mode);
-+ rtvec v = rtvec_alloc (nunits / 2);
-+ int high_base = nunits / 2;
-+ int low_base = 0;
-+ int base;
-+ rtx t1;
-+ int i;
++/* Expected results for vfms_n. */
+
-+ if (BYTES_BIG_ENDIAN)
-+ base = high ? low_base : high_base;
-+ else
-+ base = high ? high_base : low_base;
++VECT_VAR_DECL(expectedfms0, float, 32, 2) [] = {A0 + -B0 * E0, A1 + -B1 * E0};
++VECT_VAR_DECL(expectedfms1, float, 32, 2) [] = {A2 + -B2 * E1, A3 + -B3 * E1};
++VECT_VAR_DECL(expectedfms2, float, 32, 2) [] = {A4 + -B4 * E2, A5 + -B5 * E2};
++VECT_VAR_DECL(expectedfms3, float, 32, 2) [] = {A6 + -B6 * E3, A7 + -B7 * E3};
++VECT_VAR_DECL(expectedfma0, float, 32, 2) [] = {A0 + B0 * E0, A1 + B1 * E0};
++VECT_VAR_DECL(expectedfma1, float, 32, 2) [] = {A2 + B2 * E1, A3 + B3 * E1};
++VECT_VAR_DECL(expectedfma2, float, 32, 2) [] = {A4 + B4 * E2, A5 + B5 * E2};
++VECT_VAR_DECL(expectedfma3, float, 32, 2) [] = {A6 + B6 * E3, A7 + B7 * E3};
+
-+ for (i = 0; i < nunits / 2; i++)
-+ RTVEC_ELT (v, i) = GEN_INT (base + i);
++hfloat32_t * VECT_VAR (expectedfms0_static, hfloat, 32, 2) =
++ (hfloat32_t *) VECT_VAR (expectedfms0, float, 32, 2);
++hfloat32_t * VECT_VAR (expectedfms1_static, hfloat, 32, 2) =
++ (hfloat32_t *) VECT_VAR (expectedfms1, float, 32, 2);
++hfloat32_t * VECT_VAR (expectedfms2_static, hfloat, 32, 2) =
++ (hfloat32_t *) VECT_VAR (expectedfms2, float, 32, 2);
++hfloat32_t * VECT_VAR (expectedfms3_static, hfloat, 32, 2) =
++ (hfloat32_t *) VECT_VAR (expectedfms3, float, 32, 2);
++hfloat32_t * VECT_VAR (expectedfma0_static, hfloat, 32, 2) =
++ (hfloat32_t *) VECT_VAR (expectedfma0, float, 32, 2);
++hfloat32_t * VECT_VAR (expectedfma1_static, hfloat, 32, 2) =
++ (hfloat32_t *) VECT_VAR (expectedfma1, float, 32, 2);
++hfloat32_t * VECT_VAR (expectedfma2_static, hfloat, 32, 2) =
++ (hfloat32_t *) VECT_VAR (expectedfma2, float, 32, 2);
++hfloat32_t * VECT_VAR (expectedfma3_static, hfloat, 32, 2) =
++ (hfloat32_t *) VECT_VAR (expectedfma3, float, 32, 2);
+
-+ t1 = gen_rtx_PARALLEL (mode, v);
-+ return t1;
-+}
+
-+/* Check OP for validity as a PARALLEL RTX vector with elements
-+ numbering the lanes of either the high (HIGH == TRUE) or low lanes,
-+ from the perspective of the architecture. See the diagram above
-+ arm_simd_vect_par_cnst_half_p for more details. */
++VECT_VAR_DECL(expectedfms0, float, 32, 4) [] = {A0 + -B0 * E0, A1 + -B1 * E0,
++ A2 + -B2 * E0, A3 + -B3 * E0};
++VECT_VAR_DECL(expectedfms1, float, 32, 4) [] = {A4 + -B4 * E1, A5 + -B5 * E1,
++ A6 + -B6 * E1, A7 + -B7 * E1};
++VECT_VAR_DECL(expectedfms2, float, 32, 4) [] = {A0 + -B0 * E2, A2 + -B2 * E2,
++ A4 + -B4 * E2, A6 + -B6 * E2};
++VECT_VAR_DECL(expectedfms3, float, 32, 4) [] = {A1 + -B1 * E3, A3 + -B3 * E3,
++ A5 + -B5 * E3, A7 + -B7 * E3};
++VECT_VAR_DECL(expectedfma0, float, 32, 4) [] = {A0 + B0 * E0, A1 + B1 * E0,
++ A2 + B2 * E0, A3 + B3 * E0};
++VECT_VAR_DECL(expectedfma1, float, 32, 4) [] = {A4 + B4 * E1, A5 + B5 * E1,
++ A6 + B6 * E1, A7 + B7 * E1};
++VECT_VAR_DECL(expectedfma2, float, 32, 4) [] = {A0 + B0 * E2, A2 + B2 * E2,
++ A4 + B4 * E2, A6 + B6 * E2};
++VECT_VAR_DECL(expectedfma3, float, 32, 4) [] = {A1 + B1 * E3, A3 + B3 * E3,
++ A5 + B5 * E3, A7 + B7 * E3};
+
-+bool
-+arm_simd_check_vect_par_cnst_half_p (rtx op, machine_mode mode,
-+ bool high)
-+{
-+ rtx ideal = arm_simd_vect_par_cnst_half (mode, high);
-+ HOST_WIDE_INT count_op = XVECLEN (op, 0);
-+ HOST_WIDE_INT count_ideal = XVECLEN (ideal, 0);
-+ int i = 0;
++hfloat32_t * VECT_VAR (expectedfms0_static, hfloat, 32, 4) =
++ (hfloat32_t *) VECT_VAR (expectedfms0, float, 32, 4);
++hfloat32_t * VECT_VAR (expectedfms1_static, hfloat, 32, 4) =
++ (hfloat32_t *) VECT_VAR (expectedfms1, float, 32, 4);
++hfloat32_t * VECT_VAR (expectedfms2_static, hfloat, 32, 4) =
++ (hfloat32_t *) VECT_VAR (expectedfms2, float, 32, 4);
++hfloat32_t * VECT_VAR (expectedfms3_static, hfloat, 32, 4) =
++ (hfloat32_t *) VECT_VAR (expectedfms3, float, 32, 4);
++hfloat32_t * VECT_VAR (expectedfma0_static, hfloat, 32, 4) =
++ (hfloat32_t *) VECT_VAR (expectedfma0, float, 32, 4);
++hfloat32_t * VECT_VAR (expectedfma1_static, hfloat, 32, 4) =
++ (hfloat32_t *) VECT_VAR (expectedfma1, float, 32, 4);
++hfloat32_t * VECT_VAR (expectedfma2_static, hfloat, 32, 4) =
++ (hfloat32_t *) VECT_VAR (expectedfma2, float, 32, 4);
++hfloat32_t * VECT_VAR (expectedfma3_static, hfloat, 32, 4) =
++ (hfloat32_t *) VECT_VAR (expectedfma3, float, 32, 4);
+
-+ if (!VECTOR_MODE_P (mode))
-+ return false;
++VECT_VAR_DECL(expectedfms0, float, 64, 2) [] = {DA0 + -DB0 * DE0,
++ DA1 + -DB1 * DE0};
++VECT_VAR_DECL(expectedfms1, float, 64, 2) [] = {DA2 + -DB2 * DE1,
++ DA3 + -DB3 * DE1};
++VECT_VAR_DECL(expectedfms2, float, 64, 2) [] = {DA4 + -DB4 * DE2,
++ DA5 + -DB5 * DE2};
++VECT_VAR_DECL(expectedfms3, float, 64, 2) [] = {DA6 + -DB6 * DE3,
++ DA7 + -DB7 * DE3};
++VECT_VAR_DECL(expectedfma0, float, 64, 2) [] = {DA0 + DB0 * DE0,
++ DA1 + DB1 * DE0};
++VECT_VAR_DECL(expectedfma1, float, 64, 2) [] = {DA2 + DB2 * DE1,
++ DA3 + DB3 * DE1};
++VECT_VAR_DECL(expectedfma2, float, 64, 2) [] = {DA4 + DB4 * DE2,
++ DA5 + DB5 * DE2};
++VECT_VAR_DECL(expectedfma3, float, 64, 2) [] = {DA6 + DB6 * DE3,
++ DA7 + DB7 * DE3};
++hfloat64_t * VECT_VAR (expectedfms0_static, hfloat, 64, 2) =
++ (hfloat64_t *) VECT_VAR (expectedfms0, float, 64, 2);
++hfloat64_t * VECT_VAR (expectedfms1_static, hfloat, 64, 2) =
++ (hfloat64_t *) VECT_VAR (expectedfms1, float, 64, 2);
++hfloat64_t * VECT_VAR (expectedfms2_static, hfloat, 64, 2) =
++ (hfloat64_t *) VECT_VAR (expectedfms2, float, 64, 2);
++hfloat64_t * VECT_VAR (expectedfms3_static, hfloat, 64, 2) =
++ (hfloat64_t *) VECT_VAR (expectedfms3, float, 64, 2);
++hfloat64_t * VECT_VAR (expectedfma0_static, hfloat, 64, 2) =
++ (hfloat64_t *) VECT_VAR (expectedfma0, float, 64, 2);
++hfloat64_t * VECT_VAR (expectedfma1_static, hfloat, 64, 2) =
++ (hfloat64_t *) VECT_VAR (expectedfma1, float, 64, 2);
++hfloat64_t * VECT_VAR (expectedfma2_static, hfloat, 64, 2) =
++ (hfloat64_t *) VECT_VAR (expectedfma2, float, 64, 2);
++hfloat64_t * VECT_VAR (expectedfma3_static, hfloat, 64, 2) =
++ (hfloat64_t *) VECT_VAR (expectedfma3, float, 64, 2);
+
-+ if (count_op != count_ideal)
-+ return false;
++VECT_VAR_DECL(expectedfms0, float, 64, 1) [] = {DA0 + -DB0 * DE0};
++VECT_VAR_DECL(expectedfms1, float, 64, 1) [] = {DA2 + -DB2 * DE1};
++VECT_VAR_DECL(expectedfms2, float, 64, 1) [] = {DA4 + -DB4 * DE2};
++VECT_VAR_DECL(expectedfms3, float, 64, 1) [] = {DA6 + -DB6 * DE3};
++VECT_VAR_DECL(expectedfma0, float, 64, 1) [] = {DA0 + DB0 * DE0};
++VECT_VAR_DECL(expectedfma1, float, 64, 1) [] = {DA2 + DB2 * DE1};
++VECT_VAR_DECL(expectedfma2, float, 64, 1) [] = {DA4 + DB4 * DE2};
++VECT_VAR_DECL(expectedfma3, float, 64, 1) [] = {DA6 + DB6 * DE3};
+
-+ for (i = 0; i < count_ideal; i++)
-+ {
-+ rtx elt_op = XVECEXP (op, 0, i);
-+ rtx elt_ideal = XVECEXP (ideal, 0, i);
++hfloat64_t * VECT_VAR (expectedfms0_static, hfloat, 64, 1) =
++ (hfloat64_t *) VECT_VAR (expectedfms0, float, 64, 1);
++hfloat64_t * VECT_VAR (expectedfms1_static, hfloat, 64, 1) =
++ (hfloat64_t *) VECT_VAR (expectedfms1, float, 64, 1);
++hfloat64_t * VECT_VAR (expectedfms2_static, hfloat, 64, 1) =
++ (hfloat64_t *) VECT_VAR (expectedfms2, float, 64, 1);
++hfloat64_t * VECT_VAR (expectedfms3_static, hfloat, 64, 1) =
++ (hfloat64_t *) VECT_VAR (expectedfms3, float, 64, 1);
++hfloat64_t * VECT_VAR (expectedfma0_static, hfloat, 64, 1) =
++ (hfloat64_t *) VECT_VAR (expectedfma0, float, 64, 1);
++hfloat64_t * VECT_VAR (expectedfma1_static, hfloat, 64, 1) =
++ (hfloat64_t *) VECT_VAR (expectedfma1, float, 64, 1);
++hfloat64_t * VECT_VAR (expectedfma2_static, hfloat, 64, 1) =
++ (hfloat64_t *) VECT_VAR (expectedfma2, float, 64, 1);
++hfloat64_t * VECT_VAR (expectedfma3_static, hfloat, 64, 1) =
++ (hfloat64_t *) VECT_VAR (expectedfma3, float, 64, 1);
+
-+ if (!CONST_INT_P (elt_op)
-+ || INTVAL (elt_ideal) != INTVAL (elt_op))
-+ return false;
-+ }
-+ return true;
-+}
++void exec_vfma_vfms_n (void)
++{
++#undef TEST_MSG
++#define TEST_MSG "VFMS_VFMA_N (FP32)"
++ clean_results ();
+
- #include "gt-arm.h"
---- a/src/gcc/config/arm/arm.h
-+++ b/src/gcc/config/arm/arm.h
-@@ -478,6 +478,9 @@ extern int arm_tune_cortex_a9;
- interworking clean. */
- extern int arm_cpp_interwork;
-
-+/* Nonzero if chip supports Thumb 1. */
-+extern int arm_arch_thumb1;
++ DECL_VARIABLE(vsrc_1, float, 32, 2);
++ DECL_VARIABLE(vsrc_2, float, 32, 2);
++ VECT_VAR_DECL (buf_src_1, float, 32, 2) [] = {A0, A1};
++ VECT_VAR_DECL (buf_src_2, float, 32, 2) [] = {B0, B1};
++ VLOAD (vsrc_1, buf_src_1, , float, f, 32, 2);
++ VLOAD (vsrc_2, buf_src_2, , float, f, 32, 2);
++ DECL_VARIABLE (vector_res, float, 32, 2) =
++ vfms_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
++ VECT_VAR (vsrc_2, float, 32, 2), elem0);
++ vst1_f32 (VECT_VAR (result, float, 32, 2),
++ VECT_VAR (vector_res, float, 32, 2));
++ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfms0_static, "");
++ VECT_VAR (vector_res, float, 32, 2) =
++ vfma_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
++ VECT_VAR (vsrc_2, float, 32, 2), elem0);
++ vst1_f32 (VECT_VAR (result, float, 32, 2),
++ VECT_VAR (vector_res, float, 32, 2));
++ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfma0_static, "");
+
- /* Nonzero if chip supports Thumb 2. */
- extern int arm_arch_thumb2;
-
-@@ -2187,13 +2190,9 @@ extern int making_const_table;
- #define TARGET_ARM_ARCH \
- (arm_base_arch) \
-
--#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
--#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
--
- /* The highest Thumb instruction set version supported by the chip. */
--#define TARGET_ARM_ARCH_ISA_THUMB \
-- (arm_arch_thumb2 ? 2 \
-- : ((TARGET_ARM_ARCH >= 5 || arm_arch4t) ? 1 : 0))
-+#define TARGET_ARM_ARCH_ISA_THUMB \
-+ (arm_arch_thumb2 ? 2 : (arm_arch_thumb1 ? 1 : 0))
-
- /* Expands to an upper-case char of the target's architectural
- profile. */
---- a/src/gcc/config/arm/arm.md
-+++ b/src/gcc/config/arm/arm.md
-@@ -121,7 +121,7 @@
- ; arm_arch6. "v6t2" for Thumb-2 with arm_arch6. This attribute is
- ; used to compute attribute "enabled", use type "any" to enable an
- ; alternative in all cases.
--(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,v6t2,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3"
-+(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,v6t2,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3,neon"
- (const_string "any"))
-
- (define_attr "arch_enabled" "no,yes"
-@@ -177,6 +177,10 @@
- (and (eq_attr "arch" "armv6_or_vfpv3")
- (match_test "arm_arch6 || TARGET_VFP3"))
- (const_string "yes")
++ VECT_VAR_DECL (buf_src_3, float, 32, 2) [] = {A2, A3};
++ VECT_VAR_DECL (buf_src_4, float, 32, 2) [] = {B2, B3};
++ VLOAD (vsrc_1, buf_src_3, , float, f, 32, 2);
++ VLOAD (vsrc_2, buf_src_4, , float, f, 32, 2);
++ VECT_VAR (vector_res, float, 32, 2) =
++ vfms_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
++ VECT_VAR (vsrc_2, float, 32, 2), elem1);
++ vst1_f32 (VECT_VAR (result, float, 32, 2),
++ VECT_VAR (vector_res, float, 32, 2));
++ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfms1_static, "");
++ VECT_VAR (vector_res, float, 32, 2) =
++ vfma_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
++ VECT_VAR (vsrc_2, float, 32, 2), elem1);
++ vst1_f32 (VECT_VAR (result, float, 32, 2),
++ VECT_VAR (vector_res, float, 32, 2));
++ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfma1_static, "");
+
-+ (and (eq_attr "arch" "neon")
-+ (match_test "TARGET_NEON"))
-+ (const_string "yes")
- ]
-
- (const_string "no")))
-@@ -8152,8 +8156,8 @@
- )
-
- (define_insn "probe_stack"
-- [(set (match_operand 0 "memory_operand" "=m")
-- (unspec [(const_int 0)] UNSPEC_PROBE_STACK))]
-+ [(set (match_operand:SI 0 "memory_operand" "=m")
-+ (unspec:SI [(const_int 0)] UNSPEC_PROBE_STACK))]
- "TARGET_32BIT"
- "str%?\\tr0, %0"
- [(set_attr "type" "store1")
-@@ -10821,19 +10825,22 @@
- (set_attr "predicable_short_it" "no")
- (set_attr "type" "clz")])
-
--(define_expand "ctzsi2"
-- [(set (match_operand:SI 0 "s_register_operand" "")
-- (ctz:SI (match_operand:SI 1 "s_register_operand" "")))]
-+;; Keep this as a CTZ expression until after reload and then split
-+;; into RBIT + CLZ. Since RBIT is represented as an UNSPEC it is unlikely
-+;; to fold with any other expression.
++ VECT_VAR_DECL (buf_src_5, float, 32, 2) [] = {A4, A5};
++ VECT_VAR_DECL (buf_src_6, float, 32, 2) [] = {B4, B5};
++ VLOAD (vsrc_1, buf_src_5, , float, f, 32, 2);
++ VLOAD (vsrc_2, buf_src_6, , float, f, 32, 2);
++ VECT_VAR (vector_res, float, 32, 2) =
++ vfms_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
++ VECT_VAR (vsrc_2, float, 32, 2), elem2);
++ vst1_f32 (VECT_VAR (result, float, 32, 2),
++ VECT_VAR (vector_res, float, 32, 2));
++ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfms2_static, "");
++ VECT_VAR (vector_res, float, 32, 2) =
++ vfma_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
++ VECT_VAR (vsrc_2, float, 32, 2), elem2);
++ vst1_f32 (VECT_VAR (result, float, 32, 2),
++ VECT_VAR (vector_res, float, 32, 2));
++ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfma2_static, "");
+
-+(define_insn_and_split "ctzsi2"
-+ [(set (match_operand:SI 0 "s_register_operand" "=r")
-+ (ctz:SI (match_operand:SI 1 "s_register_operand" "r")))]
- "TARGET_32BIT && arm_arch_thumb2"
-+ "#"
-+ "&& reload_completed"
-+ [(const_int 0)]
- "
-- {
-- rtx tmp = gen_reg_rtx (SImode);
-- emit_insn (gen_rbitsi2 (tmp, operands[1]));
-- emit_insn (gen_clzsi2 (operands[0], tmp));
-- }
-- DONE;
-- "
--)
-+ emit_insn (gen_rbitsi2 (operands[0], operands[1]));
-+ emit_insn (gen_clzsi2 (operands[0], operands[0]));
-+ DONE;
-+")
-
- ;; V5E instructions.
-
---- a/src/gcc/config/arm/arm_neon.h
-+++ b/src/gcc/config/arm/arm_neon.h
-@@ -530,7 +530,7 @@ vadd_s32 (int32x2_t __a, int32x2_t __b)
- __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
- vadd_f32 (float32x2_t __a, float32x2_t __b)
- {
--#ifdef __FAST_MATH
-+#ifdef __FAST_MATH__
- return __a + __b;
- #else
- return (float32x2_t) __builtin_neon_vaddv2sf (__a, __b);
-@@ -594,7 +594,7 @@ vaddq_s64 (int64x2_t __a, int64x2_t __b)
- __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
- vaddq_f32 (float32x4_t __a, float32x4_t __b)
- {
--#ifdef __FAST_MATH
-+#ifdef __FAST_MATH__
- return __a + __b;
- #else
- return (float32x4_t) __builtin_neon_vaddv4sf (__a, __b);
-@@ -1030,7 +1030,7 @@ vmul_s32 (int32x2_t __a, int32x2_t __b)
- __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
- vmul_f32 (float32x2_t __a, float32x2_t __b)
- {
--#ifdef __FAST_MATH
-+#ifdef __FAST_MATH__
- return __a * __b;
- #else
- return (float32x2_t) __builtin_neon_vmulfv2sf (__a, __b);
-@@ -1077,7 +1077,7 @@ vmulq_s32 (int32x4_t __a, int32x4_t __b)
- __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
- vmulq_f32 (float32x4_t __a, float32x4_t __b)
- {
--#ifdef __FAST_MATH
-+#ifdef __FAST_MATH__
- return __a * __b;
- #else
- return (float32x4_t) __builtin_neon_vmulfv4sf (__a, __b);
-@@ -1678,7 +1678,7 @@ vsub_s32 (int32x2_t __a, int32x2_t __b)
- __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
- vsub_f32 (float32x2_t __a, float32x2_t __b)
- {
--#ifdef __FAST_MATH
-+#ifdef __FAST_MATH__
- return __a - __b;
- #else
- return (float32x2_t) __builtin_neon_vsubv2sf (__a, __b);
-@@ -1742,7 +1742,7 @@ vsubq_s64 (int64x2_t __a, int64x2_t __b)
- __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
- vsubq_f32 (float32x4_t __a, float32x4_t __b)
- {
--#ifdef __FAST_MATH
-+#ifdef __FAST_MATH__
- return __a - __b;
- #else
- return (float32x4_t) __builtin_neon_vsubv4sf (__a, __b);
-@@ -2607,6 +2607,12 @@ vtst_p8 (poly8x8_t __a, poly8x8_t __b)
- return (uint8x8_t)__builtin_neon_vtstv8qi ((int8x8_t) __a, (int8x8_t) __b);
- }
-
-+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-+vtst_p16 (poly16x4_t __a, poly16x4_t __b)
-+{
-+ return (uint16x4_t)__builtin_neon_vtstv4hi ((int16x4_t) __a, (int16x4_t) __b);
-+}
++ VECT_VAR_DECL (buf_src_7, float, 32, 2) [] = {A6, A7};
++ VECT_VAR_DECL (buf_src_8, float, 32, 2) [] = {B6, B7};
++ VLOAD (vsrc_1, buf_src_7, , float, f, 32, 2);
++ VLOAD (vsrc_2, buf_src_8, , float, f, 32, 2);
++ VECT_VAR (vector_res, float, 32, 2) =
++ vfms_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
++ VECT_VAR (vsrc_2, float, 32, 2), elem3);
++ vst1_f32 (VECT_VAR (result, float, 32, 2),
++ VECT_VAR (vector_res, float, 32, 2));
++ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfms3_static, "");
++ VECT_VAR (vector_res, float, 32, 2) =
++ vfma_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
++ VECT_VAR (vsrc_2, float, 32, 2), elem3);
++ vst1_f32 (VECT_VAR (result, float, 32, 2),
++ VECT_VAR (vector_res, float, 32, 2));
++ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfma3_static, "");
+
- __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
- vtstq_s8 (int8x16_t __a, int8x16_t __b)
- {
-@@ -2649,6 +2655,12 @@ vtstq_p8 (poly8x16_t __a, poly8x16_t __b)
- return (uint8x16_t)__builtin_neon_vtstv16qi ((int8x16_t) __a, (int8x16_t) __b);
- }
-
-+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-+vtstq_p16 (poly16x8_t __a, poly16x8_t __b)
-+{
-+ return (uint16x8_t)__builtin_neon_vtstv8hi ((int16x8_t) __a, (int16x8_t) __b);
-+}
++#undef TEST_MSG
++#define TEST_MSG "VFMSQ_VFMAQ_N (FP32)"
++ clean_results ();
+
- __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
- vabd_s8 (int8x8_t __a, int8x8_t __b)
- {
---- a/src/gcc/config/arm/crypto.md
-+++ b/src/gcc/config/arm/crypto.md
-@@ -18,14 +18,27 @@
- ;; along with GCC; see the file COPYING3. If not see
- ;; <http://www.gnu.org/licenses/>.
-
++ DECL_VARIABLE(vsrc_1, float, 32, 4);
++ DECL_VARIABLE(vsrc_2, float, 32, 4);
++ VECT_VAR_DECL (buf_src_1, float, 32, 4) [] = {A0, A1, A2, A3};
++ VECT_VAR_DECL (buf_src_2, float, 32, 4) [] = {B0, B1, B2, B3};
++ VLOAD (vsrc_1, buf_src_1, q, float, f, 32, 4);
++ VLOAD (vsrc_2, buf_src_2, q, float, f, 32, 4);
++ DECL_VARIABLE (vector_res, float, 32, 4) =
++ vfmsq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
++ VECT_VAR (vsrc_2, float, 32, 4), elem0);
++ vst1q_f32 (VECT_VAR (result, float, 32, 4),
++ VECT_VAR (vector_res, float, 32, 4));
++ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfms0_static, "");
++ VECT_VAR (vector_res, float, 32, 4) =
++ vfmaq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
++ VECT_VAR (vsrc_2, float, 32, 4), elem0);
++ vst1q_f32 (VECT_VAR (result, float, 32, 4),
++ VECT_VAR (vector_res, float, 32, 4));
++ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfma0_static, "");
+
-+;; When AES/AESMC fusion is enabled we want the register allocation to
-+;; look like:
-+;; AESE Vn, _
-+;; AESMC Vn, Vn
-+;; So prefer to tie operand 1 to operand 0 when fusing.
++ VECT_VAR_DECL (buf_src_3, float, 32, 4) [] = {A4, A5, A6, A7};
++ VECT_VAR_DECL (buf_src_4, float, 32, 4) [] = {B4, B5, B6, B7};
++ VLOAD (vsrc_1, buf_src_3, q, float, f, 32, 4);
++ VLOAD (vsrc_2, buf_src_4, q, float, f, 32, 4);
++ VECT_VAR (vector_res, float, 32, 4) =
++ vfmsq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
++ VECT_VAR (vsrc_2, float, 32, 4), elem1);
++ vst1q_f32 (VECT_VAR (result, float, 32, 4),
++ VECT_VAR (vector_res, float, 32, 4));
++ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfms1_static, "");
++ VECT_VAR (vector_res, float, 32, 4) =
++ vfmaq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
++ VECT_VAR (vsrc_2, float, 32, 4), elem1);
++ vst1q_f32 (VECT_VAR (result, float, 32, 4),
++ VECT_VAR (vector_res, float, 32, 4));
++ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfma1_static, "");
+
- (define_insn "crypto_<crypto_pattern>"
-- [(set (match_operand:<crypto_mode> 0 "register_operand" "=w")
-+ [(set (match_operand:<crypto_mode> 0 "register_operand" "=w,w")
- (unspec:<crypto_mode> [(match_operand:<crypto_mode> 1
-- "register_operand" "w")]
-+ "register_operand" "0,w")]
- CRYPTO_UNARY))]
- "TARGET_CRYPTO"
- "<crypto_pattern>.<crypto_size_sfx>\\t%q0, %q1"
-- [(set_attr "type" "<crypto_type>")]
-+ [(set_attr "type" "<crypto_type>")
-+ (set_attr_alternative "enabled"
-+ [(if_then_else (match_test
-+ "arm_fusion_enabled_p (tune_params::FUSE_AES_AESMC)")
-+ (const_string "yes" )
-+ (const_string "no"))
-+ (const_string "yes")])]
- )
-
- (define_insn "crypto_<crypto_pattern>"
---- a/src/gcc/config/arm/neon.md
-+++ b/src/gcc/config/arm/neon.md
-@@ -1204,16 +1204,133 @@
-
- ;; Widening operations
-
-+(define_expand "widen_ssum<mode>3"
-+ [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
-+ (plus:<V_double_width>
-+ (sign_extend:<V_double_width>
-+ (match_operand:VQI 1 "s_register_operand" ""))
-+ (match_operand:<V_double_width> 2 "s_register_operand" "")))]
-+ "TARGET_NEON"
-+ {
-+ machine_mode mode = GET_MODE (operands[1]);
-+ rtx p1, p2;
++ VECT_VAR_DECL (buf_src_5, float, 32, 4) [] = {A0, A2, A4, A6};
++ VECT_VAR_DECL (buf_src_6, float, 32, 4) [] = {B0, B2, B4, B6};
++ VLOAD (vsrc_1, buf_src_5, q, float, f, 32, 4);
++ VLOAD (vsrc_2, buf_src_6, q, float, f, 32, 4);
++ VECT_VAR (vector_res, float, 32, 4) =
++ vfmsq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
++ VECT_VAR (vsrc_2, float, 32, 4), elem2);
++ vst1q_f32 (VECT_VAR (result, float, 32, 4),
++ VECT_VAR (vector_res, float, 32, 4));
++ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfms2_static, "");
++ VECT_VAR (vector_res, float, 32, 4) =
++ vfmaq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
++ VECT_VAR (vsrc_2, float, 32, 4), elem2);
++ vst1q_f32 (VECT_VAR (result, float, 32, 4),
++ VECT_VAR (vector_res, float, 32, 4));
++ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfma2_static, "");
++
++ VECT_VAR_DECL (buf_src_7, float, 32, 4) [] = {A1, A3, A5, A7};
++ VECT_VAR_DECL (buf_src_8, float, 32, 4) [] = {B1, B3, B5, B7};
++ VLOAD (vsrc_1, buf_src_7, q, float, f, 32, 4);
++ VLOAD (vsrc_2, buf_src_8, q, float, f, 32, 4);
++ VECT_VAR (vector_res, float, 32, 4) =
++ vfmsq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
++ VECT_VAR (vsrc_2, float, 32, 4), elem3);
++ vst1q_f32 (VECT_VAR (result, float, 32, 4),
++ VECT_VAR (vector_res, float, 32, 4));
++ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfms3_static, "");
++ VECT_VAR (vector_res, float, 32, 4) =
++ vfmaq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
++ VECT_VAR (vsrc_2, float, 32, 4), elem3);
++ vst1q_f32 (VECT_VAR (result, float, 32, 4),
++ VECT_VAR (vector_res, float, 32, 4));
++ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfma3_static, "");
+
-+ p1 = arm_simd_vect_par_cnst_half (mode, false);
-+ p2 = arm_simd_vect_par_cnst_half (mode, true);
++#undef TEST_MSG
++#define TEST_MSG "VFMSQ_VFMAQ_N (FP64)"
++ clean_results ();
+
-+ if (operands[0] != operands[2])
-+ emit_move_insn (operands[0], operands[2]);
++ DECL_VARIABLE(vsrc_1, float, 64, 2);
++ DECL_VARIABLE(vsrc_2, float, 64, 2);
++ VECT_VAR_DECL (buf_src_1, float, 64, 2) [] = {DA0, DA1};
++ VECT_VAR_DECL (buf_src_2, float, 64, 2) [] = {DB0, DB1};
++ VLOAD (vsrc_1, buf_src_1, q, float, f, 64, 2);
++ VLOAD (vsrc_2, buf_src_2, q, float, f, 64, 2);
++ DECL_VARIABLE (vector_res, float, 64, 2) =
++ vfmsq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
++ VECT_VAR (vsrc_2, float, 64, 2), delem0);
++ vst1q_f64 (VECT_VAR (result, float, 64, 2),
++ VECT_VAR (vector_res, float, 64, 2));
++ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfms0_static, "");
++ VECT_VAR (vector_res, float, 64, 2) =
++ vfmaq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
++ VECT_VAR (vsrc_2, float, 64, 2), delem0);
++ vst1q_f64 (VECT_VAR (result, float, 64, 2),
++ VECT_VAR (vector_res, float, 64, 2));
++ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfma0_static, "");
+
-+ emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0],
-+ operands[1],
-+ p1,
-+ operands[0]));
-+ emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0],
-+ operands[1],
-+ p2,
-+ operands[0]));
-+ DONE;
-+ }
-+)
++ VECT_VAR_DECL (buf_src_3, float, 64, 2) [] = {DA2, DA3};
++ VECT_VAR_DECL (buf_src_4, float, 64, 2) [] = {DB2, DB3};
++ VLOAD (vsrc_1, buf_src_3, q, float, f, 64, 2);
++ VLOAD (vsrc_2, buf_src_4, q, float, f, 64, 2);
++ VECT_VAR (vector_res, float, 64, 2) =
++ vfmsq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
++ VECT_VAR (vsrc_2, float, 64, 2), delem1);
++ vst1q_f64 (VECT_VAR (result, float, 64, 2),
++ VECT_VAR (vector_res, float, 64, 2));
++ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfms1_static, "");
++ VECT_VAR (vector_res, float, 64, 2) =
++ vfmaq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
++ VECT_VAR (vsrc_2, float, 64, 2), delem1);
++ vst1q_f64 (VECT_VAR (result, float, 64, 2),
++ VECT_VAR (vector_res, float, 64, 2));
++ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfma1_static, "");
+
-+(define_insn "vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3"
-+ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
-+ (plus:<VW:V_widen>
-+ (sign_extend:<VW:V_widen>
-+ (vec_select:VW
-+ (match_operand:VQI 1 "s_register_operand" "%w")
-+ (match_operand:VQI 2 "vect_par_constant_low" "")))
-+ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
-+ "TARGET_NEON"
-+{
-+ return BYTES_BIG_ENDIAN ? "vaddw.<V_s_elem>\t%q0, %q3, %f1" :
-+ "vaddw.<V_s_elem>\t%q0, %q3, %e1";
-+}
-+ [(set_attr "type" "neon_add_widen")])
++ VECT_VAR_DECL (buf_src_5, float, 64, 2) [] = {DA4, DA5};
++ VECT_VAR_DECL (buf_src_6, float, 64, 2) [] = {DB4, DB5};
++ VLOAD (vsrc_1, buf_src_5, q, float, f, 64, 2);
++ VLOAD (vsrc_2, buf_src_6, q, float, f, 64, 2);
++ VECT_VAR (vector_res, float, 64, 2) =
++ vfmsq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
++ VECT_VAR (vsrc_2, float, 64, 2), delem2);
++ vst1q_f64 (VECT_VAR (result, float, 64, 2),
++ VECT_VAR (vector_res, float, 64, 2));
++ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfms2_static, "");
++ VECT_VAR (vector_res, float, 64, 2) =
++ vfmaq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
++ VECT_VAR (vsrc_2, float, 64, 2), delem2);
++ vst1q_f64 (VECT_VAR (result, float, 64, 2),
++ VECT_VAR (vector_res, float, 64, 2));
++ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfma2_static, "");
+
-+(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3"
-+ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
-+ (plus:<VW:V_widen>
-+ (sign_extend:<VW:V_widen>
-+ (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
-+ (match_operand:VQI 2 "vect_par_constant_high" "")))
-+ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
-+ "TARGET_NEON"
-+{
-+ return BYTES_BIG_ENDIAN ? "vaddw.<V_s_elem>\t%q0, %q3, %e1" :
-+ "vaddw.<V_s_elem>\t%q0, %q3, %f1";
-+}
-+ [(set_attr "type" "neon_add_widen")])
++ VECT_VAR_DECL (buf_src_7, float, 64, 2) [] = {DA6, DA7};
++ VECT_VAR_DECL (buf_src_8, float, 64, 2) [] = {DB6, DB7};
++ VLOAD (vsrc_1, buf_src_7, q, float, f, 64, 2);
++ VLOAD (vsrc_2, buf_src_8, q, float, f, 64, 2);
++ VECT_VAR (vector_res, float, 64, 2) =
++ vfmsq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
++ VECT_VAR (vsrc_2, float, 64, 2), delem3);
++ vst1q_f64 (VECT_VAR (result, float, 64, 2),
++ VECT_VAR (vector_res, float, 64, 2));
++ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfms3_static, "");
++ VECT_VAR (vector_res, float, 64, 2) =
++ vfmaq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
++ VECT_VAR (vsrc_2, float, 64, 2), delem3);
++ vst1q_f64 (VECT_VAR (result, float, 64, 2),
++ VECT_VAR (vector_res, float, 64, 2));
++ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfma3_static, "");
+
- (define_insn "widen_ssum<mode>3"
- [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
-- (plus:<V_widen> (sign_extend:<V_widen>
-- (match_operand:VW 1 "s_register_operand" "%w"))
-- (match_operand:<V_widen> 2 "s_register_operand" "w")))]
-+ (plus:<V_widen>
-+ (sign_extend:<V_widen>
-+ (match_operand:VW 1 "s_register_operand" "%w"))
-+ (match_operand:<V_widen> 2 "s_register_operand" "w")))]
- "TARGET_NEON"
- "vaddw.<V_s_elem>\t%q0, %q2, %P1"
- [(set_attr "type" "neon_add_widen")]
- )
-
-+(define_expand "widen_usum<mode>3"
-+ [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
-+ (plus:<V_double_width>
-+ (zero_extend:<V_double_width>
-+ (match_operand:VQI 1 "s_register_operand" ""))
-+ (match_operand:<V_double_width> 2 "s_register_operand" "")))]
-+ "TARGET_NEON"
-+ {
-+ machine_mode mode = GET_MODE (operands[1]);
-+ rtx p1, p2;
++#undef TEST_MSG
++#define TEST_MSG "VFMS_VFMA_N (FP64)"
++ clean_results ();
+
-+ p1 = arm_simd_vect_par_cnst_half (mode, false);
-+ p2 = arm_simd_vect_par_cnst_half (mode, true);
++ DECL_VARIABLE(vsrc_1, float, 64, 1);
++ DECL_VARIABLE(vsrc_2, float, 64, 1);
++ VECT_VAR_DECL (buf_src_1, float, 64, 1) [] = {DA0};
++ VECT_VAR_DECL (buf_src_2, float, 64, 1) [] = {DB0};
++ VLOAD (vsrc_1, buf_src_1, , float, f, 64, 1);
++ VLOAD (vsrc_2, buf_src_2, , float, f, 64, 1);
++ DECL_VARIABLE (vector_res, float, 64, 1) =
++ vfms_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
++ VECT_VAR (vsrc_2, float, 64, 1), delem0);
++ vst1_f64 (VECT_VAR (result, float, 64, 1),
++ VECT_VAR (vector_res, float, 64, 1));
++ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfms0_static, "");
++ VECT_VAR (vector_res, float, 64, 1) =
++ vfma_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
++ VECT_VAR (vsrc_2, float, 64, 1), delem0);
++ vst1_f64 (VECT_VAR (result, float, 64, 1),
++ VECT_VAR (vector_res, float, 64, 1));
++ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfma0_static, "");
+
-+ if (operands[0] != operands[2])
-+ emit_move_insn (operands[0], operands[2]);
++ VECT_VAR_DECL (buf_src_3, float, 64, 1) [] = {DA2};
++ VECT_VAR_DECL (buf_src_4, float, 64, 1) [] = {DB2};
++ VLOAD (vsrc_1, buf_src_3, , float, f, 64, 1);
++ VLOAD (vsrc_2, buf_src_4, , float, f, 64, 1);
++ VECT_VAR (vector_res, float, 64, 1) =
++ vfms_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
++ VECT_VAR (vsrc_2, float, 64, 1), delem1);
++ vst1_f64 (VECT_VAR (result, float, 64, 1),
++ VECT_VAR (vector_res, float, 64, 1));
++ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfms1_static, "");
++ VECT_VAR (vector_res, float, 64, 1) =
++ vfma_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
++ VECT_VAR (vsrc_2, float, 64, 1), delem1);
++ vst1_f64 (VECT_VAR (result, float, 64, 1),
++ VECT_VAR (vector_res, float, 64, 1));
++ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfma1_static, "");
+
-+ emit_insn (gen_vec_sel_widen_usum_lo<mode><V_half>3 (operands[0],
-+ operands[1],
-+ p1,
-+ operands[0]));
-+ emit_insn (gen_vec_sel_widen_usum_hi<mode><V_half>3 (operands[0],
-+ operands[1],
-+ p2,
-+ operands[0]));
-+ DONE;
-+ }
-+)
++ VECT_VAR_DECL (buf_src_5, float, 64, 1) [] = {DA4};
++ VECT_VAR_DECL (buf_src_6, float, 64, 1) [] = {DB4};
++ VLOAD (vsrc_1, buf_src_5, , float, f, 64, 1);
++ VLOAD (vsrc_2, buf_src_6, , float, f, 64, 1);
++ VECT_VAR (vector_res, float, 64, 1) =
++ vfms_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
++ VECT_VAR (vsrc_2, float, 64, 1), delem2);
++ vst1_f64 (VECT_VAR (result, float, 64, 1),
++ VECT_VAR (vector_res, float, 64, 1));
++ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfms2_static, "");
++ VECT_VAR (vector_res, float, 64, 1) =
++ vfma_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
++ VECT_VAR (vsrc_2, float, 64, 1), delem2);
++ vst1_f64 (VECT_VAR (result, float, 64, 1),
++ VECT_VAR (vector_res, float, 64, 1));
++ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfma2_static, "");
+
-+(define_insn "vec_sel_widen_usum_lo<VQI:mode><VW:mode>3"
-+ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
-+ (plus:<VW:V_widen>
-+ (zero_extend:<VW:V_widen>
-+ (vec_select:VW
-+ (match_operand:VQI 1 "s_register_operand" "%w")
-+ (match_operand:VQI 2 "vect_par_constant_low" "")))
-+ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
-+ "TARGET_NEON"
-+{
-+ return BYTES_BIG_ENDIAN ? "vaddw.<V_u_elem>\t%q0, %q3, %f1" :
-+ "vaddw.<V_u_elem>\t%q0, %q3, %e1";
++ VECT_VAR_DECL (buf_src_7, float, 64, 1) [] = {DA6};
++ VECT_VAR_DECL (buf_src_8, float, 64, 1) [] = {DB6};
++ VLOAD (vsrc_1, buf_src_7, , float, f, 64, 1);
++ VLOAD (vsrc_2, buf_src_8, , float, f, 64, 1);
++ VECT_VAR (vector_res, float, 64, 1) =
++ vfms_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
++ VECT_VAR (vsrc_2, float, 64, 1), delem3);
++ vst1_f64 (VECT_VAR (result, float, 64, 1),
++ VECT_VAR (vector_res, float, 64, 1));
++ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfms3_static, "");
++ VECT_VAR (vector_res, float, 64, 1) =
++ vfma_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
++ VECT_VAR (vsrc_2, float, 64, 1), delem3);
++ vst1_f64 (VECT_VAR (result, float, 64, 1),
++ VECT_VAR (vector_res, float, 64, 1));
++ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfma3_static, "");
+}
-+ [(set_attr "type" "neon_add_widen")])
++#endif
+
-+(define_insn "vec_sel_widen_usum_hi<VQI:mode><VW:mode>3"
-+ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
-+ (plus:<VW:V_widen>
-+ (zero_extend:<VW:V_widen>
-+ (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
-+ (match_operand:VQI 2 "vect_par_constant_high" "")))
-+ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
-+ "TARGET_NEON"
++int
++main (void)
+{
-+ return BYTES_BIG_ENDIAN ? "vaddw.<V_u_elem>\t%q0, %q3, %e1" :
-+ "vaddw.<V_u_elem>\t%q0, %q3, %f1";
++#if defined(__aarch64__) && defined(__ARM_FEATURE_FMA)
++ exec_vfma_vfms_n ();
++#endif
++ return 0;
+}
-+ [(set_attr "type" "neon_add_widen")])
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
- (define_insn "widen_usum<mode>3"
- [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
- (plus:<V_widen> (zero_extend:<V_widen>
---- a/src/gcc/config/arm/predicates.md
-+++ b/src/gcc/config/arm/predicates.md
-@@ -612,59 +612,13 @@
- (define_special_predicate "vect_par_constant_high"
- (match_code "parallel")
- {
-- HOST_WIDE_INT count = XVECLEN (op, 0);
-- int i;
-- int base = GET_MODE_NUNITS (mode);
--
-- if ((count < 1)
-- || (count != base/2))
-- return false;
--
-- if (!VECTOR_MODE_P (mode))
-- return false;
--
-- for (i = 0; i < count; i++)
-- {
-- rtx elt = XVECEXP (op, 0, i);
-- int val;
--
-- if (!CONST_INT_P (elt))
-- return false;
--
-- val = INTVAL (elt);
-- if (val != (base/2) + i)
-- return false;
-- }
-- return true;
-+ return arm_simd_check_vect_par_cnst_half_p (op, mode, true);
- })
++#include <arm_fp16.h>
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x42af /* 3.341797 */,
++ 0x5043 /* 34.093750 */,
++ 0xccd2 /* -19.281250 */,
++ 0x3712 /* 0.441895 */,
++ 0x3acc /* 0.849609 */,
++ 0x4848 /* 8.562500 */,
++ 0xcc43 /* -17.046875 */,
++ 0xd65c /* -101.750000 */,
++ 0x4185 /* 2.759766 */,
++ 0xcd39 /* -20.890625 */,
++ 0xd45b /* -69.687500 */,
++ 0x5241 /* 50.031250 */,
++ 0xc675 /* -6.457031 */,
++ 0x4d07 /* 20.109375 */,
++ 0x7c00 /* inf */,
++ 0xfc00 /* -inf */
++};
++
++#define TEST_MSG "VFMSH_F16"
++#define INSN_NAME vfmsh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "ternary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c
+@@ -13,6 +13,7 @@ uint32_t expected_u32 = 0xfffffff1;
+ uint64_t expected_u64 = 0xfffffffffffffff0;
+ poly8_t expected_p8 = 0xf6;
+ poly16_t expected_p16 = 0xfff2;
++hfloat16_t expected_f16 = 0xcb80;
+ hfloat32_t expected_f32 = 0xc1700000;
- (define_special_predicate "vect_par_constant_low"
- (match_code "parallel")
- {
-- HOST_WIDE_INT count = XVECLEN (op, 0);
-- int i;
-- int base = GET_MODE_NUNITS (mode);
--
-- if ((count < 1)
-- || (count != base/2))
-- return false;
--
-- if (!VECTOR_MODE_P (mode))
-- return false;
--
-- for (i = 0; i < count; i++)
-- {
-- rtx elt = XVECEXP (op, 0, i);
-- int val;
--
-- if (!CONST_INT_P (elt))
-- return false;
--
-- val = INTVAL (elt);
-- if (val != i)
-- return false;
-- }
-- return true;
-+ return arm_simd_check_vect_par_cnst_half_p (op, mode, false);
- })
+ int8_t expectedq_s8 = 0xff;
+@@ -25,6 +26,7 @@ uint32_t expectedq_u32 = 0xfffffff2;
+ uint64_t expectedq_u64 = 0xfffffffffffffff1;
+ poly8_t expectedq_p8 = 0xfe;
+ poly16_t expectedq_p16 = 0xfff6;
++hfloat16_t expectedq_f16 = 0xca80;
+ hfloat32_t expectedq_f32 = 0xc1500000;
- (define_predicate "const_double_vcvt_power_of_two_reciprocal"
---- a/src/gcc/config/arm/sync.md
-+++ b/src/gcc/config/arm/sync.md
-@@ -452,14 +452,13 @@
- {
- if (<MODE>mode == DImode)
- {
-- rtx value = operands[2];
- /* The restrictions on target registers in ARM mode are that the two
- registers are consecutive and the first one is even; Thumb is
- actually more flexible, but DI should give us this anyway.
-- Note that the 1st register always gets the lowest word in memory. */
-- gcc_assert ((REGNO (value) & 1) == 0 || TARGET_THUMB2);
-- operands[3] = gen_rtx_REG (SImode, REGNO (value) + 1);
-- return "strexd%?\t%0, %2, %3, %C1";
-+ Note that the 1st register always gets the
-+ lowest word in memory. */
-+ gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
-+ return "strexd%?\t%0, %2, %H2, %C1";
- }
- return "strex<sync_sfx>%?\t%0, %2, %C1";
- }
-@@ -475,11 +474,9 @@
- VUNSPEC_SLX))]
- "TARGET_HAVE_LDACQ && ARM_DOUBLEWORD_ALIGN"
- {
-- rtx value = operands[2];
- /* See comment in arm_store_exclusive<mode> above. */
-- gcc_assert ((REGNO (value) & 1) == 0 || TARGET_THUMB2);
-- operands[3] = gen_rtx_REG (SImode, REGNO (value) + 1);
-- return "stlexd%?\t%0, %2, %3, %C1";
-+ gcc_assert ((REGNO (operands[2]) & 1) == 0 || TARGET_THUMB2);
-+ return "stlexd%?\t%0, %2, %H2, %C1";
- }
- [(set_attr "predicable" "yes")
- (set_attr "predicable_short_it" "no")])
---- a/src/gcc/config/arm/thumb1.md
-+++ b/src/gcc/config/arm/thumb1.md
-@@ -142,11 +142,11 @@
- (set_attr "type" "alus_sreg")]
- )
+ int error_found = 0;
+@@ -52,6 +54,12 @@ void exec_vget_lane (void)
+ uint32_t var_int32;
+ float32_t var_float32;
+ } var_int32_float32;
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ union {
++ uint16_t var_int16;
++ float16_t var_float16;
++ } var_int16_float16;
++#endif
--; Unfortunately with the Thumb the '&'/'0' trick can fails when operands
--; 1 and 2; are the same, because reload will make operand 0 match
--; operand 1 without realizing that this conflicts with operand 2. We fix
--; this by adding another alternative to match this case, and then `reload'
--; it ourselves. This alternative must come first.
-+;; Unfortunately on Thumb the '&'/'0' trick can fail when operands
-+;; 1 and 2 are the same, because reload will make operand 0 match
-+;; operand 1 without realizing that this conflicts with operand 2. We fix
-+;; this by adding another alternative to match this case, and then `reload'
-+;; it ourselves. This alternative must come first.
- (define_insn "*thumb_mulsi3"
- [(set (match_operand:SI 0 "register_operand" "=&l,&l,&l")
- (mult:SI (match_operand:SI 1 "register_operand" "%l,*h,0")
---- a/src/gcc/config/arm/vfp.md
-+++ b/src/gcc/config/arm/vfp.md
-@@ -394,8 +394,8 @@
- ;; DFmode moves
+ #define TEST_VGET_LANE_FP(Q, T1, T2, W, N, L) \
+ VAR(var, T1, W) = vget##Q##_lane_##T2##W(VECT_VAR(vector, T1, W, N), L); \
+@@ -81,10 +89,17 @@ void exec_vget_lane (void)
+ VAR_DECL(var, uint, 64);
+ VAR_DECL(var, poly, 8);
+ VAR_DECL(var, poly, 16);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ VAR_DECL(var, float, 16);
++#endif
+ VAR_DECL(var, float, 32);
- (define_insn "*movdf_vfp"
-- [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w ,w ,Uv,r, m,w,r")
-- (match_operand:DF 1 "soft_df_operand" " ?r,w,Dy,UvF,w ,mF,r,w,r"))]
-+ [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w ,w,w ,Uv,r, m,w,r")
-+ (match_operand:DF 1 "soft_df_operand" " ?r,w,Dy,G,UvF,w ,mF,r,w,r"))]
- "TARGET_ARM && TARGET_HARD_FLOAT && TARGET_VFP
- && ( register_operand (operands[0], DFmode)
- || register_operand (operands[1], DFmode))"
-@@ -410,39 +410,43 @@
- case 2:
- gcc_assert (TARGET_VFP_DOUBLE);
- return \"vmov%?.f64\\t%P0, %1\";
-- case 3: case 4:
-+ case 3:
-+ gcc_assert (TARGET_VFP_DOUBLE);
-+ return \"vmov.i64\\t%P0, #0\\t%@ float\";
-+ case 4: case 5:
- return output_move_vfp (operands);
-- case 5: case 6:
-+ case 6: case 7:
- return output_move_double (operands, true, NULL);
-- case 7:
-+ case 8:
- if (TARGET_VFP_SINGLE)
- return \"vmov%?.f32\\t%0, %1\;vmov%?.f32\\t%p0, %p1\";
- else
- return \"vmov%?.f64\\t%P0, %P1\";
-- case 8:
-+ case 9:
- return \"#\";
- default:
- gcc_unreachable ();
- }
- }
- "
-- [(set_attr "type" "f_mcrr,f_mrrc,fconstd,f_loadd,f_stored,\
-+ [(set_attr "type" "f_mcrr,f_mrrc,fconstd,neon_move,f_loadd,f_stored,\
- load2,store2,ffarithd,multiple")
-- (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
-- (eq_attr "alternative" "7")
-+ (set (attr "length") (cond [(eq_attr "alternative" "6,7,9") (const_int 8)
-+ (eq_attr "alternative" "8")
- (if_then_else
- (match_test "TARGET_VFP_SINGLE")
- (const_int 8)
- (const_int 4))]
- (const_int 4)))
-- (set_attr "predicable" "yes")
-- (set_attr "pool_range" "*,*,*,1020,*,1020,*,*,*")
-- (set_attr "neg_pool_range" "*,*,*,1004,*,1004,*,*,*")]
-+ (set_attr "predicable" "yes,yes,yes,no,yes,yes,yes,yes,yes,yes")
-+ (set_attr "pool_range" "*,*,*,*,1020,*,1020,*,*,*")
-+ (set_attr "neg_pool_range" "*,*,*,*,1004,*,1004,*,*,*")
-+ (set_attr "arch" "any,any,any,neon,any,any,any,any,any,any")]
- )
+ /* Initialize input values. */
+ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector, buffer, , float, f, 32, 2);
+ VLOAD(vector, buffer, q, float, f, 32, 4);
+
+@@ -99,6 +114,9 @@ void exec_vget_lane (void)
+ TEST_VGET_LANE(, uint, u, 64, 1, 0);
+ TEST_VGET_LANE(, poly, p, 8, 8, 6);
+ TEST_VGET_LANE(, poly, p, 16, 4, 2);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VGET_LANE_FP(, float, f, 16, 4, 1);
++#endif
+ TEST_VGET_LANE_FP(, float, f, 32, 2, 1);
+
+ TEST_VGET_LANE(q, int, s, 8, 16, 15);
+@@ -111,6 +129,9 @@ void exec_vget_lane (void)
+ TEST_VGET_LANE(q, uint, u, 64, 2, 1);
+ TEST_VGET_LANE(q, poly, p, 8, 16, 14);
+ TEST_VGET_LANE(q, poly, p, 16, 8, 6);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VGET_LANE_FP(q, float, f, 16, 8, 3);
++#endif
+ TEST_VGET_LANE_FP(q, float, f, 32, 4, 3);
+ }
+
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
+
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
+
+ float16x4x2_t
+ f_vld2_lane_f16 (float16_t * p, float16x4x2_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
+
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
+
+ float16x8x2_t
+ f_vld2q_lane_f16 (float16_t * p, float16x8x2_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
+
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
+
+ float16x4x3_t
+ f_vld3_lane_f16 (float16_t * p, float16x4x3_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
- (define_insn "*thumb2_movdf_vfp"
-- [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w ,w ,Uv,r ,m,w,r")
-- (match_operand:DF 1 "soft_df_operand" " ?r,w,Dy,UvF,w, mF,r, w,r"))]
-+ [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=w,?r,w ,w,w ,Uv,r ,m,w,r")
-+ (match_operand:DF 1 "soft_df_operand" " ?r,w,Dy,G,UvF,w, mF,r, w,r"))]
- "TARGET_THUMB2 && TARGET_HARD_FLOAT && TARGET_VFP
- && ( register_operand (operands[0], DFmode)
- || register_operand (operands[1], DFmode))"
-@@ -457,11 +461,14 @@
- case 2:
- gcc_assert (TARGET_VFP_DOUBLE);
- return \"vmov%?.f64\\t%P0, %1\";
-- case 3: case 4:
-+ case 3:
-+ gcc_assert (TARGET_VFP_DOUBLE);
-+ return \"vmov.i64\\t%P0, #0\\t%@ float\";
-+ case 4: case 5:
- return output_move_vfp (operands);
-- case 5: case 6: case 8:
-+ case 6: case 7: case 9:
- return output_move_double (operands, true, NULL);
-- case 7:
-+ case 8:
- if (TARGET_VFP_SINGLE)
- return \"vmov%?.f32\\t%0, %1\;vmov%?.f32\\t%p0, %p1\";
- else
-@@ -471,17 +478,18 @@
- }
- }
- "
-- [(set_attr "type" "f_mcrr,f_mrrc,fconstd,f_loadd,\
-+ [(set_attr "type" "f_mcrr,f_mrrc,fconstd,neon_move,f_loadd,\
- f_stored,load2,store2,ffarithd,multiple")
-- (set (attr "length") (cond [(eq_attr "alternative" "5,6,8") (const_int 8)
-- (eq_attr "alternative" "7")
-+ (set (attr "length") (cond [(eq_attr "alternative" "6,7,9") (const_int 8)
-+ (eq_attr "alternative" "8")
- (if_then_else
- (match_test "TARGET_VFP_SINGLE")
- (const_int 8)
- (const_int 4))]
- (const_int 4)))
-- (set_attr "pool_range" "*,*,*,1018,*,4094,*,*,*")
-- (set_attr "neg_pool_range" "*,*,*,1008,*,0,*,*,*")]
-+ (set_attr "pool_range" "*,*,*,*,1018,*,4094,*,*,*")
-+ (set_attr "neg_pool_range" "*,*,*,*,1008,*,0,*,*,*")
-+ (set_attr "arch" "any,any,any,neon,any,any,any,any,any,any")]
- )
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
+ float16x8x3_t
+ f_vld3q_lane_f16 (float16_t * p, float16x8x3_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
---- a/src/gcc/configure
-+++ b/src/gcc/configure
-@@ -1711,7 +1711,8 @@ Optional Packages:
- --with-stabs arrange to use stabs instead of host debug format
- --with-dwarf2 force the default debug format to be DWARF 2
- --with-specs=SPECS add SPECS to driver command-line processing
-- --with-pkgversion=PKG Use PKG in the version string in place of "GCC"
-+ --with-pkgversion=PKG Use PKG in the version string in place of "Linaro
-+ GCC `cat $srcdir/LINARO-VERSION`"
- --with-bugurl=URL Direct users to URL to report a bug
- --with-multilib-list select multilibs (AArch64, SH and x86-64 only)
- --with-gnu-ld assume the C compiler uses GNU ld default=no
-@@ -7651,7 +7652,7 @@ if test "${with_pkgversion+set}" = set; then :
- *) PKGVERSION="($withval) " ;;
- esac
- else
-- PKGVERSION="(GCC) "
-+ PKGVERSION="(Linaro GCC `cat $srcdir/LINARO-VERSION`) "
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
- fi
+ float16x4x4_t
+ f_vld4_lane_f16 (float16_t * p, float16x4x4_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
-@@ -18453,7 +18454,7 @@ else
- lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
- lt_status=$lt_dlunknown
- cat > conftest.$ac_ext <<_LT_EOF
--#line 18456 "configure"
-+#line 18457 "configure"
- #include "confdefs.h"
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
- #if HAVE_DLFCN_H
-@@ -18559,7 +18560,7 @@ else
- lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
- lt_status=$lt_dlunknown
- cat > conftest.$ac_ext <<_LT_EOF
--#line 18562 "configure"
-+#line 18563 "configure"
- #include "confdefs.h"
+ float16x8x4_t
+ f_vld4q_lane_f16 (float16_t * p, float16x8x4_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c
+@@ -7,6 +7,10 @@
- #if HAVE_DLFCN_H
---- a/src/gcc/configure.ac
-+++ b/src/gcc/configure.ac
-@@ -903,7 +903,7 @@ AC_ARG_WITH(specs,
- )
- AC_SUBST(CONFIGURE_SPECS)
+ #define HAS_FLOAT_VARIANT
--ACX_PKGVERSION([GCC])
-+ACX_PKGVERSION([Linaro GCC `cat $srcdir/LINARO-VERSION`])
- ACX_BUGURL([http://gcc.gnu.org/bugs.html])
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++#define HAS_FLOAT16_VARIANT
++#endif
++
+ /* Expected results. */
+ VECT_VAR_DECL(expected,int,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
+ 0xf4, 0xf5, 0xf6, 0xf7 };
+@@ -16,6 +20,9 @@ VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3,
+ 0xf4, 0xf5, 0xf6, 0xf7 };
+ VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff1, 0xfff1, 0xfff2, 0xfff3 };
+ VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcbc0, 0xcb80, 0xcb00, 0xca80 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1780000, 0xc1700000 };
+ VECT_VAR_DECL(expected,int,8,16) [] = { 0xf4, 0xf4, 0xf4, 0xf4,
+ 0xf4, 0xf5, 0xf6, 0xf7,
+@@ -33,10 +40,36 @@ VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff3,
+ 0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+ VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff1, 0xfffffff1,
+ 0xfffffff2, 0xfffffff3 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xcb40, 0xcb40, 0xcb00, 0xca80,
++ 0xca00, 0xc980, 0xc900, 0xc880 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1680000, 0xc1680000,
+ 0xc1600000, 0xc1500000 };
- # Sanity check enable_languages in case someone does not run the toplevel
---- a/src/gcc/cppbuiltin.c
-+++ b/src/gcc/cppbuiltin.c
-@@ -52,18 +52,41 @@ parse_basever (int *major, int *minor, int *patchlevel)
- *patchlevel = s_patchlevel;
- }
+ /* Expected results with special FP values. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_nan, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++VECT_VAR_DECL(expected_mnan, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++VECT_VAR_DECL(expected_inf, hfloat, 16, 8) [] = { 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00 };
++VECT_VAR_DECL(expected_minf, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00 };
++VECT_VAR_DECL(expected_zero1, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_zero2, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
++#endif
+ VECT_VAR_DECL(expected_nan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
+ 0x7fc00000, 0x7fc00000 };
+ VECT_VAR_DECL(expected_mnan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxh_f16_1.c
+@@ -0,0 +1,34 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++#define A 123.4
++#define B -567.8
++#define C -34.8
++#define D 1024
++#define E 663.1
++#define F 169.1
++#define G -4.8
++#define H 77
++
++float16_t input_1[] = { A, B, C, D };
++float16_t input_2[] = { E, F, G, H };
++float16_t expected[] = { E, F, G, D };
++
++#define TEST_MSG "VMAXH_F16"
++#define INSN_NAME vmaxh_f16
++
++#define INPUT_1 input_1
++#define INPUT_2 input_2
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnm_1.c
+@@ -0,0 +1,47 @@
++/* This file tests an intrinsic which currently has only an f16 variant and that
++ is only available when FP16 arithmetic instructions are supported. */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++#define INSN_NAME vmaxnm
++#define TEST_MSG "VMAXNM/VMAXNMQ"
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++#define HAS_FLOAT16_VARIANT
++#endif
++
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcbc0, 0xcb80, 0xcb00, 0xca80 };
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xcb40, 0xcb40, 0xcb00, 0xca80,
++ 0xca00, 0xc980, 0xc900, 0xc880 };
++#endif
++
++/* Expected results with special FP values. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_nan, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00 };
++VECT_VAR_DECL(expected_mnan, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00 };
++VECT_VAR_DECL(expected_inf, hfloat, 16, 8) [] = { 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00 };
++VECT_VAR_DECL(expected_minf, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00 };
++VECT_VAR_DECL(expected_zero1, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_zero2, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
++#endif
++
++#include "binary_op_float.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c
+@@ -0,0 +1,42 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++#define INFF __builtin_inf ()
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x4000 /* 2.000000 */,
++ 0x5640 /* 100.000000 */,
++ 0x4f80 /* 30.000000 */,
++ 0x3666 /* 0.399902 */,
++ 0x3800 /* 0.500000 */,
++ 0x3d52 /* 1.330078 */,
++ 0xc64d /* -6.300781 */,
++ 0x4d00 /* 20.000000 */,
++ 0x355d /* 0.335205 */,
++ 0x409a /* 2.300781 */,
++ 0x3c00 /* 1.000000 */,
++ 0x4a91 /* 13.132812 */,
++ 0x34f6 /* 0.310059 */,
++ 0x4d00 /* 20.000000 */,
++ 0x7c00 /* inf */,
++ 0x7c00 /* inf */
++};
++
++#define TEST_MSG "VMAXNMH_F16"
++#define INSN_NAME vmaxnmh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxnmv_f16_1.c
+@@ -0,0 +1,131 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++#define FP16_C(a) ((__fp16) a)
++#define A0 FP16_C (34.8)
++#define B0 FP16_C (__builtin_nanf (""))
++#define C0 FP16_C (-__builtin_nanf (""))
++#define D0 FP16_C (0.0)
++
++#define A1 FP16_C (1025.8)
++#define B1 FP16_C (13.4)
++#define C1 FP16_C (__builtin_nanf (""))
++#define D1 FP16_C (10)
++#define E1 FP16_C (-0.0)
++#define F1 FP16_C (-__builtin_nanf (""))
++#define G1 FP16_C (0.0)
++#define H1 FP16_C (10)
++
++/* Expected results for vmaxnmv. */
++uint16_t expect = 0x505A /* A0. */;
++uint16_t expect_alt = 0x6402 /* A1. */;
++
++void exec_vmaxnmv_f16 (void)
++{
++#undef TEST_MSG
++#define TEST_MSG "VMAXNMV (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc, float, 16, 4);
++ VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A0, B0, C0, D0};
++ VLOAD (vsrc, buf_src, , float, f, 16, 4);
++ float16_t vector_res = vmaxnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++ VECT_VAR_DECL (buf_src1, float, 16, 4) [] = {B0, A0, C0, D0};
++ VLOAD (vsrc, buf_src1, , float, f, 16, 4);
++ vector_res = vmaxnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++ VECT_VAR_DECL (buf_src2, float, 16, 4) [] = {B0, C0, A0, D0};
++ VLOAD (vsrc, buf_src2, , float, f, 16, 4);
++ vector_res = vmaxnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++ VECT_VAR_DECL (buf_src3, float, 16, 4) [] = {B0, C0, D0, A0};
++ VLOAD (vsrc, buf_src3, , float, f, 16, 4);
++ vector_res = vmaxnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++#undef TEST_MSG
++#define TEST_MSG "VMAXNMVQ (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc, float, 16, 8);
++ VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A1, B1, C1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src, q, float, f, 16, 8);
++ vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src1, float, 16, 8) [] = {B1, A1, C1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src1, q, float, f, 16, 8);
++ vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src2, float, 16, 8) [] = {B1, C1, A1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src2, q, float, f, 16, 8);
++ vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src3, float, 16, 8) [] = {B1, C1, D1, A1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src3, q, float, f, 16, 8);
++ vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src4, float, 16, 8) [] = {B1, C1, D1, E1, A1, F1, G1, H1};
++ VLOAD (vsrc, buf_src4, q, float, f, 16, 8);
++ vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src5, float, 16, 8) [] = {B1, C1, D1, E1, F1, A1, G1, H1};
++ VLOAD (vsrc, buf_src5, q, float, f, 16, 8);
++ vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src6, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, A1, H1};
++ VLOAD (vsrc, buf_src6, q, float, f, 16, 8);
++ vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src7, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, H1, A1};
++ VLOAD (vsrc, buf_src7, q, float, f, 16, 8);
++ vector_res = vmaxnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++}
++
++int
++main (void)
++{
++ exec_vmaxnmv_f16 ();
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmaxv_f16_1.c
+@@ -0,0 +1,131 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++#define FP16_C(a) ((__fp16) a)
++#define A0 FP16_C (123.4)
++#define B0 FP16_C (-567.8)
++#define C0 FP16_C (34.8)
++#define D0 FP16_C (0.0)
++
++#define A1 FP16_C (1025.8)
++#define B1 FP16_C (13.4)
++#define C1 FP16_C (-567.8)
++#define D1 FP16_C (10)
++#define E1 FP16_C (-0.0)
++#define F1 FP16_C (567.8)
++#define G1 FP16_C (0.0)
++#define H1 FP16_C (10)
++
++/* Expected results for vmaxv. */
++uint16_t expect = 0x57B6 /* A0. */;
++uint16_t expect_alt = 0x6402 /* A1. */;
++
++void exec_vmaxv_f16 (void)
++{
++#undef TEST_MSG
++#define TEST_MSG "VMAXV (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc, float, 16, 4);
++ VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A0, B0, C0, D0};
++ VLOAD (vsrc, buf_src, , float, f, 16, 4);
++ float16_t vector_res = vmaxv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++ VECT_VAR_DECL (buf_src1, float, 16, 4) [] = {B0, A0, C0, D0};
++ VLOAD (vsrc, buf_src1, , float, f, 16, 4);
++ vector_res = vmaxv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++ VECT_VAR_DECL (buf_src2, float, 16, 4) [] = {B0, C0, A0, D0};
++ VLOAD (vsrc, buf_src2, , float, f, 16, 4);
++ vector_res = vmaxv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++ VECT_VAR_DECL (buf_src3, float, 16, 4) [] = {B0, C0, D0, A0};
++ VLOAD (vsrc, buf_src3, , float, f, 16, 4);
++ vector_res = vmaxv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++#undef TEST_MSG
++#define TEST_MSG "VMAXVQ (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc, float, 16, 8);
++ VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A1, B1, C1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src, q, float, f, 16, 8);
++ vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src1, float, 16, 8) [] = {B1, A1, C1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src1, q, float, f, 16, 8);
++ vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src2, float, 16, 8) [] = {B1, C1, A1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src2, q, float, f, 16, 8);
++ vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src3, float, 16, 8) [] = {B1, C1, D1, A1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src3, q, float, f, 16, 8);
++ vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src4, float, 16, 8) [] = {B1, C1, D1, E1, A1, F1, G1, H1};
++ VLOAD (vsrc, buf_src4, q, float, f, 16, 8);
++ vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src5, float, 16, 8) [] = {B1, C1, D1, E1, F1, A1, G1, H1};
++ VLOAD (vsrc, buf_src5, q, float, f, 16, 8);
++ vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src6, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, A1, H1};
++ VLOAD (vsrc, buf_src6, q, float, f, 16, 8);
++ vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src7, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, H1, A1};
++ VLOAD (vsrc, buf_src7, q, float, f, 16, 8);
++ vector_res = vmaxvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++}
++
++int
++main (void)
++{
++ exec_vmaxv_f16 ();
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c
+@@ -7,6 +7,10 @@
-+/* Parse a LINAROVER version string of the format "M.m-year.month[-spin][~dev]"
-+ to create Linaro release number YYYYMM and spin version. */
-+static void
-+parse_linarover (int *release, int *spin)
+ #define HAS_FLOAT_VARIANT
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++#define HAS_FLOAT16_VARIANT
++#endif
++
+ /* Expected results. */
+ VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf3, 0xf3, 0xf3, 0xf3 };
+@@ -16,6 +20,9 @@ VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf3, 0xf3, 0xf3, 0xf3 };
+ VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff1, 0xfff1 };
+ VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff0 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcc00, 0xcbc0, 0xcbc0, 0xcbc0 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0xc1780000 };
+ VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf4, 0xf4, 0xf4, 0xf4,
+@@ -31,11 +38,41 @@ VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf9, 0xf9, 0xf9, 0xf9 };
+ VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff2,
+ 0xfff2, 0xfff2, 0xfff2, 0xfff2 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80, 0xcb40, 0xcb40,
++ 0xcb40, 0xcb40, 0xcb40, 0xcb40 };
++#endif
+ VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
+ 0xfffffff1, 0xfffffff1 };
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
+ 0xc1680000, 0xc1680000 };
+ /* Expected results with special FP values. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_nan, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++VECT_VAR_DECL(expected_mnan, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++VECT_VAR_DECL(expected_inf, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00 };
++VECT_VAR_DECL(expected_minf, hfloat, 16, 8) [] = { 0xfc00, 0xfc00,
++ 0xfc00, 0xfc00,
++ 0xfc00, 0xfc00,
++ 0xfc00, 0xfc00 };
++VECT_VAR_DECL(expected_zero1, hfloat, 16, 8) [] = { 0x8000, 0x8000,
++ 0x8000, 0x8000,
++ 0x8000, 0x8000,
++ 0x8000, 0x8000 };
++VECT_VAR_DECL(expected_zero2, hfloat, 16, 8) [] = { 0x8000, 0x8000,
++ 0x8000, 0x8000,
++ 0x8000, 0x8000,
++ 0x8000, 0x8000 };
++#endif
+ VECT_VAR_DECL(expected_nan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
+ 0x7fc00000, 0x7fc00000 };
+ VECT_VAR_DECL(expected_mnan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminh_f16_1.c
+@@ -0,0 +1,34 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++#define A 123.4
++#define B -567.8
++#define C -34.8
++#define D 1024
++#define E 663.1
++#define F 169.1
++#define G -4.8
++#define H 77
++
++float16_t input_1[] = { A, B, C, D };
++float16_t input_2[] = { E, F, G, H };
++float16_t expected[] = { A, B, C, H };
++
++#define TEST_MSG "VMINH_F16"
++#define INSN_NAME vminh_f16
++
++#define INPUT_1 input_1
++#define INPUT_2 input_2
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnm_1.c
+@@ -0,0 +1,51 @@
++/* This file tests an intrinsic which currently has only an f16 variant and that
++ is only available when FP16 arithmetic instructions are supported. */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++#define INSN_NAME vminnm
++#define TEST_MSG "VMINNM/VMINMQ"
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++#define HAS_FLOAT16_VARIANT
++#endif
++
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcc00, 0xcbc0, 0xcbc0, 0xcbc0 };
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80, 0xcb40, 0xcb40,
++ 0xcb40, 0xcb40, 0xcb40, 0xcb40 };
++#endif
++
++/* Expected results with special FP values. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_nan, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00 };
++VECT_VAR_DECL(expected_mnan, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00 };
++VECT_VAR_DECL(expected_inf, hfloat, 16, 8) [] = { 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00,
++ 0x3c00, 0x3c00 };
++VECT_VAR_DECL(expected_minf, hfloat, 16, 8) [] = { 0xfc00, 0xfc00,
++ 0xfc00, 0xfc00,
++ 0xfc00, 0xfc00,
++ 0xfc00, 0xfc00 };
++VECT_VAR_DECL(expected_zero1, hfloat, 16, 8) [] = { 0x8000, 0x8000,
++ 0x8000, 0x8000,
++ 0x8000, 0x8000,
++ 0x8000, 0x8000 };
++VECT_VAR_DECL(expected_zero2, hfloat, 16, 8) [] = { 0x8000, 0x8000,
++ 0x8000, 0x8000,
++ 0x8000, 0x8000,
++ 0x8000, 0x8000 };
++#endif
++
++#include "binary_op_float.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c
+@@ -0,0 +1,42 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++#define INFF __builtin_inf ()
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0xc454 /* -4.328125 */,
++ 0x4233 /* 3.099609 */,
++ 0x4d00 /* 20.000000 */,
++ 0xa51f /* -0.020004 */,
++ 0xc09a /* -2.300781 */,
++ 0xc73b /* -7.230469 */,
++ 0xc79a /* -7.601562 */,
++ 0x34f6 /* 0.310059 */,
++ 0xc73b /* -7.230469 */,
++ 0x3800 /* 0.500000 */,
++ 0xc79a /* -7.601562 */,
++ 0x451a /* 5.101562 */,
++ 0xc64d /* -6.300781 */,
++ 0x3556 /* 0.333496 */,
++ 0xfc00 /* -inf */,
++ 0xfc00 /* -inf */
++};
++
++#define TEST_MSG "VMINNMH_F16"
++#define INSN_NAME vminnmh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminnmv_f16_1.c
+@@ -0,0 +1,131 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++#define FP16_C(a) ((__fp16) a)
++#define A0 FP16_C (-567.8)
++#define B0 FP16_C (__builtin_nanf (""))
++#define C0 FP16_C (34.8)
++#define D0 FP16_C (-__builtin_nanf (""))
++
++#define A1 FP16_C (-567.8)
++#define B1 FP16_C (1025.8)
++#define C1 FP16_C (-__builtin_nanf (""))
++#define D1 FP16_C (10)
++#define E1 FP16_C (-0.0)
++#define F1 FP16_C (__builtin_nanf (""))
++#define G1 FP16_C (0.0)
++#define H1 FP16_C (10)
++
++/* Expected results for vminnmv. */
++uint16_t expect = 0xE070 /* A0. */;
++uint16_t expect_alt = 0xE070 /* A1. */;
++
++void exec_vminnmv_f16 (void)
+{
-+ static int s_year = -1, s_month, s_spin;
++#undef TEST_MSG
++#define TEST_MSG "VMINNMV (FP16)"
++ clean_results ();
+
-+ if (s_year == -1)
-+ if (sscanf (LINAROVER, "%*[^-]-%d.%d-%d", &s_year, &s_month, &s_spin) != 3)
-+ {
-+ sscanf (LINAROVER, "%*[^-]-%d.%d", &s_year, &s_month);
-+ s_spin = 0;
-+ }
++ DECL_VARIABLE(vsrc, float, 16, 4);
++ VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A0, B0, C0, D0};
++ VLOAD (vsrc, buf_src, , float, f, 16, 4);
++ float16_t vector_res = vminnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
-+ if (release)
-+ *release = s_year * 100 + s_month;
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
+
-+ if (spin)
-+ *spin = s_spin;
++ VECT_VAR_DECL (buf_src1, float, 16, 4) [] = {B0, A0, C0, D0};
++ VLOAD (vsrc, buf_src1, , float, f, 16, 4);
++ vector_res = vminnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++ VECT_VAR_DECL (buf_src2, float, 16, 4) [] = {B0, C0, A0, D0};
++ VLOAD (vsrc, buf_src2, , float, f, 16, 4);
++ vector_res = vminnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++ VECT_VAR_DECL (buf_src3, float, 16, 4) [] = {B0, C0, D0, A0};
++ VLOAD (vsrc, buf_src3, , float, f, 16, 4);
++ vector_res = vminnmv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++#undef TEST_MSG
++#define TEST_MSG "VMINNMVQ (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc, float, 16, 8);
++ VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A1, B1, C1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src, q, float, f, 16, 8);
++ vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src1, float, 16, 8) [] = {B1, A1, C1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src1, q, float, f, 16, 8);
++ vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src2, float, 16, 8) [] = {B1, C1, A1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src2, q, float, f, 16, 8);
++ vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src3, float, 16, 8) [] = {B1, C1, D1, A1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src3, q, float, f, 16, 8);
++ vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src4, float, 16, 8) [] = {B1, C1, D1, E1, A1, F1, G1, H1};
++ VLOAD (vsrc, buf_src4, q, float, f, 16, 8);
++ vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src5, float, 16, 8) [] = {B1, C1, D1, E1, F1, A1, G1, H1};
++ VLOAD (vsrc, buf_src5, q, float, f, 16, 8);
++ vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src6, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, A1, H1};
++ VLOAD (vsrc, buf_src6, q, float, f, 16, 8);
++ vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src7, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, H1, A1};
++ VLOAD (vsrc, buf_src7, q, float, f, 16, 8);
++ vector_res = vminnmvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
+}
-
- /* Define __GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__ and __VERSION__. */
- static void
- define__GNUC__ (cpp_reader *pfile)
- {
-- int major, minor, patchlevel;
-+ int major, minor, patchlevel, linaro_release, linaro_spin;
-
- parse_basever (&major, &minor, &patchlevel);
-+ parse_linarover (&linaro_release, &linaro_spin);
- cpp_define_formatted (pfile, "__GNUC__=%d", major);
- cpp_define_formatted (pfile, "__GNUC_MINOR__=%d", minor);
- cpp_define_formatted (pfile, "__GNUC_PATCHLEVEL__=%d", patchlevel);
- cpp_define_formatted (pfile, "__VERSION__=\"%s\"", version_string);
-+ cpp_define_formatted (pfile, "__LINARO_RELEASE__=%d", linaro_release);
-+ cpp_define_formatted (pfile, "__LINARO_SPIN__=%d", linaro_spin);
- cpp_define_formatted (pfile, "__ATOMIC_RELAXED=%d", MEMMODEL_RELAXED);
- cpp_define_formatted (pfile, "__ATOMIC_SEQ_CST=%d", MEMMODEL_SEQ_CST);
- cpp_define_formatted (pfile, "__ATOMIC_ACQUIRE=%d", MEMMODEL_ACQUIRE);
---- a/src/gcc/ifcvt.c
-+++ b/src/gcc/ifcvt.c
-@@ -817,6 +817,7 @@ struct noce_if_info
-
- static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
- static int noce_try_move (struct noce_if_info *);
-+static int noce_try_ifelse_collapse (struct noce_if_info *);
- static int noce_try_store_flag (struct noce_if_info *);
- static int noce_try_addcc (struct noce_if_info *);
- static int noce_try_store_flag_constants (struct noce_if_info *);
-@@ -1120,6 +1121,37 @@ noce_try_move (struct noce_if_info *if_info)
- return FALSE;
- }
-
-+/* Try forming an IF_THEN_ELSE (cond, b, a) and collapsing that
-+ through simplify_rtx. Sometimes that can eliminate the IF_THEN_ELSE.
-+ If that is the case, emit the result into x. */
+
-+static int
-+noce_try_ifelse_collapse (struct noce_if_info * if_info)
++int
++main (void)
+{
-+ if (!noce_simple_bbs (if_info))
-+ return FALSE;
++ exec_vminnmv_f16 ();
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vminv_f16_1.c
+@@ -0,0 +1,131 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+ machine_mode mode = GET_MODE (if_info->x);
-+ rtx if_then_else = simplify_gen_ternary (IF_THEN_ELSE, mode, mode,
-+ if_info->cond, if_info->b,
-+ if_info->a);
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+ if (GET_CODE (if_then_else) == IF_THEN_ELSE)
-+ return FALSE;
++#define FP16_C(a) ((__fp16) a)
++#define A0 FP16_C (-567.8)
++#define B0 FP16_C (123.4)
++#define C0 FP16_C (34.8)
++#define D0 FP16_C (0.0)
++
++#define A1 FP16_C (-567.8)
++#define B1 FP16_C (1025.8)
++#define C1 FP16_C (13.4)
++#define D1 FP16_C (10)
++#define E1 FP16_C (-0.0)
++#define F1 FP16_C (567.8)
++#define G1 FP16_C (0.0)
++#define H1 FP16_C (10)
++
++/* Expected results for vminv. */
++uint16_t expect = 0xE070 /* A0. */;
++uint16_t expect_alt = 0xE070 /* A1. */;
++
++void exec_vminv_f16 (void)
++{
++#undef TEST_MSG
++#define TEST_MSG "VMINV (FP16)"
++ clean_results ();
+
-+ rtx_insn *seq;
-+ start_sequence ();
-+ noce_emit_move_insn (if_info->x, if_then_else);
-+ seq = end_ifcvt_sequence (if_info);
-+ if (!seq)
-+ return FALSE;
++ DECL_VARIABLE(vsrc, float, 16, 4);
++ VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A0, B0, C0, D0};
++ VLOAD (vsrc, buf_src, , float, f, 16, 4);
++ float16_t vector_res = vminv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
-+ emit_insn_before_setloc (seq, if_info->jump,
-+ INSN_LOCATION (if_info->insn_a));
-+ return TRUE;
-+}
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
+
++ VECT_VAR_DECL (buf_src1, float, 16, 4) [] = {B0, A0, C0, D0};
++ VLOAD (vsrc, buf_src1, , float, f, 16, 4);
++ vector_res = vminv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
- /* Convert "if (test) x = 1; else x = 0".
-
- Only try 0 and STORE_FLAG_VALUE here. Other combinations will be
-@@ -2364,28 +2396,32 @@ noce_get_alt_condition (struct noce_if_info *if_info, rtx target,
- switch (code)
- {
- case LT:
-- if (actual_val == desired_val + 1)
-+ if (desired_val != HOST_WIDE_INT_MAX
-+ && actual_val == desired_val + 1)
- {
- code = LE;
- op_b = GEN_INT (desired_val);
- }
- break;
- case LE:
-- if (actual_val == desired_val - 1)
-+ if (desired_val != HOST_WIDE_INT_MIN
-+ && actual_val == desired_val - 1)
- {
- code = LT;
- op_b = GEN_INT (desired_val);
- }
- break;
- case GT:
-- if (actual_val == desired_val - 1)
-+ if (desired_val != HOST_WIDE_INT_MIN
-+ && actual_val == desired_val - 1)
- {
- code = GE;
- op_b = GEN_INT (desired_val);
- }
- break;
- case GE:
-- if (actual_val == desired_val + 1)
-+ if (desired_val != HOST_WIDE_INT_MAX
-+ && actual_val == desired_val + 1)
- {
- code = GT;
- op_b = GEN_INT (desired_val);
-@@ -3493,6 +3529,8 @@ noce_process_if_block (struct noce_if_info *if_info)
-
- if (noce_try_move (if_info))
- goto success;
-+ if (noce_try_ifelse_collapse (if_info))
-+ goto success;
- if (noce_try_store_flag (if_info))
- goto success;
- if (noce_try_bitop (if_info))
---- a/src/gcc/internal-fn.c
-+++ b/src/gcc/internal-fn.c
-@@ -1807,11 +1807,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
- /* For sub-word operations, retry with a wider type first. */
- if (orig_precres == precres && precop <= BITS_PER_WORD)
- {
--#if WORD_REGISTER_OPERATIONS
-- int p = BITS_PER_WORD;
--#else
-- int p = precop;
--#endif
-+ int p = WORD_REGISTER_OPERATIONS ? BITS_PER_WORD : precop;
- enum machine_mode m = smallest_mode_for_size (p, MODE_INT);
- tree optype = build_nonstandard_integer_type (GET_MODE_PRECISION (m),
- uns0_p && uns1_p
---- a/src/gcc/lra-constraints.c
-+++ b/src/gcc/lra-constraints.c
-@@ -1303,7 +1303,22 @@ process_addr_reg (rtx *loc, bool check_only_p, rtx_insn **before, rtx_insn **aft
-
- subreg_p = GET_CODE (*loc) == SUBREG;
- if (subreg_p)
-- loc = &SUBREG_REG (*loc);
-+ {
-+ reg = SUBREG_REG (*loc);
-+ mode = GET_MODE (reg);
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
+
-+ /* For mode with size bigger than ptr_mode, there unlikely to be "mov"
-+ between two registers with different classes, but there normally will
-+ be "mov" which transfers element of vector register into the general
-+ register, and this normally will be a subreg which should be reloaded
-+ as a whole. This is particularly likely to be triggered when
-+ -fno-split-wide-types specified. */
-+ if (!REG_P (reg)
-+ || in_class_p (reg, cl, &new_class)
-+ || GET_MODE_SIZE (mode) <= GET_MODE_SIZE (ptr_mode))
-+ loc = &SUBREG_REG (*loc);
-+ }
++ VECT_VAR_DECL (buf_src2, float, 16, 4) [] = {B0, C0, A0, D0};
++ VLOAD (vsrc, buf_src2, , float, f, 16, 4);
++ vector_res = vminv_f16 (VECT_VAR (vsrc, float, 16, 4));
+
- reg = *loc;
- mode = GET_MODE (reg);
- if (! REG_P (reg))
---- a/src/gcc/lto/lto-partition.c
-+++ b/src/gcc/lto/lto-partition.c
-@@ -447,7 +447,7 @@ add_sorted_nodes (vec<symtab_node *> &next_nodes, ltrans_partition partition)
- and in-partition calls was reached. */
-
- void
--lto_balanced_map (int n_lto_partitions)
-+lto_balanced_map (int n_lto_partitions, int max_partition_size)
- {
- int n_nodes = 0;
- int n_varpool_nodes = 0, varpool_pos = 0, best_varpool_pos = 0;
-@@ -511,6 +511,9 @@ lto_balanced_map (int n_lto_partitions)
- varpool_order.qsort (varpool_node_cmp);
-
- /* Compute partition size and create the first partition. */
-+ if (PARAM_VALUE (MIN_PARTITION_SIZE) > max_partition_size)
-+ fatal_error (input_location, "min partition size cannot be greater than max partition size");
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
+
- partition_size = total_size / n_lto_partitions;
- if (partition_size < PARAM_VALUE (MIN_PARTITION_SIZE))
- partition_size = PARAM_VALUE (MIN_PARTITION_SIZE);
-@@ -719,7 +722,8 @@ lto_balanced_map (int n_lto_partitions)
- best_cost, best_internal, best_i);
- /* Partition is too large, unwind into step when best cost was reached and
- start new partition. */
-- if (partition->insns > 2 * partition_size)
-+ if (partition->insns > 2 * partition_size
-+ || partition->insns > max_partition_size)
- {
- if (best_i != i)
- {
---- a/src/gcc/lto/lto-partition.h
-+++ b/src/gcc/lto/lto-partition.h
-@@ -35,7 +35,7 @@ extern vec<ltrans_partition> ltrans_partitions;
++ VECT_VAR_DECL (buf_src3, float, 16, 4) [] = {B0, C0, D0, A0};
++ VLOAD (vsrc, buf_src3, , float, f, 16, 4);
++ vector_res = vminv_f16 (VECT_VAR (vsrc, float, 16, 4));
++
++ if (* (uint16_t *) &vector_res != expect)
++ abort ();
++
++#undef TEST_MSG
++#define TEST_MSG "VMINVQ (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc, float, 16, 8);
++ VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A1, B1, C1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src, q, float, f, 16, 8);
++ vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src1, float, 16, 8) [] = {B1, A1, C1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src1, q, float, f, 16, 8);
++ vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src2, float, 16, 8) [] = {B1, C1, A1, D1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src2, q, float, f, 16, 8);
++ vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src3, float, 16, 8) [] = {B1, C1, D1, A1, E1, F1, G1, H1};
++ VLOAD (vsrc, buf_src3, q, float, f, 16, 8);
++ vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src4, float, 16, 8) [] = {B1, C1, D1, E1, A1, F1, G1, H1};
++ VLOAD (vsrc, buf_src4, q, float, f, 16, 8);
++ vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src5, float, 16, 8) [] = {B1, C1, D1, E1, F1, A1, G1, H1};
++ VLOAD (vsrc, buf_src5, q, float, f, 16, 8);
++ vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src6, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, A1, H1};
++ VLOAD (vsrc, buf_src6, q, float, f, 16, 8);
++ vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++
++ VECT_VAR_DECL (buf_src7, float, 16, 8) [] = {B1, C1, D1, E1, F1, G1, H1, A1};
++ VLOAD (vsrc, buf_src7, q, float, f, 16, 8);
++ vector_res = vminvq_f16 (VECT_VAR (vsrc, float, 16, 8));
++
++ if (* (uint16_t *) &vector_res != expect_alt)
++ abort ();
++}
++
++int
++main (void)
++{
++ exec_vminv_f16 ();
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
+@@ -13,6 +13,10 @@ VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfab0, 0xfb05, 0xfb5a, 0xfbaf };
+ VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffff9a0, 0xfffffa06 };
+ VECT_VAR_DECL(expected,poly,8,8) [] = { 0xc0, 0x84, 0x48, 0xc,
+ 0xd0, 0x94, 0x58, 0x1c };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xe02a, 0xdfcf,
++ 0xdf4a, 0xdec4 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc4053333, 0xc3f9c000 };
+ VECT_VAR_DECL(expected,int,8,16) [] = { 0x90, 0x7, 0x7e, 0xf5,
+ 0x6c, 0xe3, 0x5a, 0xd1,
+@@ -34,13 +38,15 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0x60, 0xca, 0x34, 0x9e,
+ 0xc8, 0x62, 0x9c, 0x36,
+ 0x30, 0x9a, 0x64, 0xce,
+ 0x98, 0x32, 0xcc, 0x66 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xe63a, 0xe5d6, 0xe573, 0xe50f,
++ 0xe4ac, 0xe448, 0xe3c8, 0xe301 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc4c73333, 0xc4bac000,
+ 0xc4ae4ccd, 0xc4a1d999 };
- void lto_1_to_1_map (void);
- void lto_max_map (void);
--void lto_balanced_map (int);
-+void lto_balanced_map (int, int);
- void lto_promote_cross_file_statics (void);
- void free_ltrans_partitions (void);
- void lto_promote_statics_nonwpa (void);
---- a/src/gcc/lto/lto.c
-+++ b/src/gcc/lto/lto.c
-@@ -3117,9 +3117,10 @@ do_whole_program_analysis (void)
- else if (flag_lto_partition == LTO_PARTITION_MAX)
- lto_max_map ();
- else if (flag_lto_partition == LTO_PARTITION_ONE)
-- lto_balanced_map (1);
-+ lto_balanced_map (1, INT_MAX);
- else if (flag_lto_partition == LTO_PARTITION_BALANCED)
-- lto_balanced_map (PARAM_VALUE (PARAM_LTO_PARTITIONS));
-+ lto_balanced_map (PARAM_VALUE (PARAM_LTO_PARTITIONS),
-+ PARAM_VALUE (MAX_PARTITION_SIZE));
- else
- gcc_unreachable ();
+-#ifndef INSN_NAME
+ #define INSN_NAME vmul
+ #define TEST_MSG "VMUL"
+-#endif
---- a/src/gcc/params.def
-+++ b/src/gcc/params.def
-@@ -1027,7 +1027,12 @@ DEFPARAM (PARAM_LTO_PARTITIONS,
- DEFPARAM (MIN_PARTITION_SIZE,
- "lto-min-partition",
- "Minimal size of a partition for LTO (in estimated instructions).",
-- 1000, 0, 0)
-+ 10000, 0, 0)
+ #define FNNAME1(NAME) exec_ ## NAME
+ #define FNNAME(NAME) FNNAME1(NAME)
+@@ -80,6 +86,17 @@ void FNNAME (INSN_NAME) (void)
+ DECL_VMUL(poly, 8, 16);
+ DECL_VMUL(float, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector1, float, 16, 4);
++ DECL_VARIABLE(vector1, float, 16, 8);
+
-+DEFPARAM (MAX_PARTITION_SIZE,
-+ "lto-max-partition",
-+ "Maximal size of a partition for LTO (in estimated instructions).",
-+ 1000000, 0, INT_MAX)
++ DECL_VARIABLE(vector2, float, 16, 4);
++ DECL_VARIABLE(vector2, float, 16, 8);
++
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
++
+ clean_results ();
- /* Diagnostic parameters. */
+ /* Initialize input "vector1" from "buffer". */
+@@ -99,6 +116,10 @@ void FNNAME (INSN_NAME) (void)
+ VLOAD(vector1, buffer, q, uint, u, 32, 4);
+ VLOAD(vector1, buffer, q, poly, p, 8, 16);
+ VLOAD(vector1, buffer, q, float, f, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector1, buffer, , float, f, 16, 4);
++ VLOAD(vector1, buffer, q, float, f, 16, 8);
++#endif
---- a/src/gcc/rtlanal.c
-+++ b/src/gcc/rtlanal.c
-@@ -3657,6 +3657,16 @@ subreg_get_info (unsigned int xregno, machine_mode xmode,
- info->offset = offset / regsize_xmode;
- return;
- }
-+ /* It's not valid to extract a subreg of mode YMODE at OFFSET that
-+ would go outside of XMODE. */
-+ if (!rknown
-+ && GET_MODE_SIZE (ymode) + offset > GET_MODE_SIZE (xmode))
-+ {
-+ info->representable_p = false;
-+ info->nregs = nregs_ymode;
-+ info->offset = offset / regsize_xmode;
-+ return;
-+ }
- /* Quick exit for the simple and common case of extracting whole
- subregisters from a multiregister value. */
- /* ??? It would be better to integrate this into the code below,
---- a/src/gcc/simplify-rtx.c
-+++ b/src/gcc/simplify-rtx.c
-@@ -5267,6 +5267,50 @@ simplify_const_relational_operation (enum rtx_code code,
+ /* Choose init value arbitrarily. */
+ VDUP(vector2, , int, s, 8, 8, 0x11);
+@@ -117,6 +138,10 @@ void FNNAME (INSN_NAME) (void)
+ VDUP(vector2, q, uint, u, 32, 4, 0xCC);
+ VDUP(vector2, q, poly, p, 8, 16, 0xAA);
+ VDUP(vector2, q, float, f, 32, 4, 99.6f);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector2, , float, f, 16, 4, 33.3f);
++ VDUP(vector2, q, float, f, 16, 8, 99.6f);
++#endif
- return 0;
+ /* Execute the tests. */
+ TEST_VMUL(INSN_NAME, , int, s, 8, 8);
+@@ -135,6 +160,10 @@ void FNNAME (INSN_NAME) (void)
+ TEST_VMUL(INSN_NAME, q, uint, u, 32, 4);
+ TEST_VMUL(INSN_NAME, q, poly, p, 8, 16);
+ TEST_VMUL(INSN_NAME, q, float, f, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VMUL(INSN_NAME, , float, f, 16, 4);
++ TEST_VMUL(INSN_NAME, q, float, f, 16, 8);
++#endif
+
+ CHECK(TEST_MSG, int, 8, 8, PRIx8, expected, "");
+ CHECK(TEST_MSG, int, 16, 4, PRIx16, expected, "");
+@@ -152,6 +181,10 @@ void FNNAME (INSN_NAME) (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+ CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected, "");
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
++#endif
+ }
+
+ int main (void)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c
+@@ -7,6 +7,9 @@ VECT_VAR_DECL(expected,int,16,4) [] = { 0xffc0, 0xffc4, 0xffc8, 0xffcc };
+ VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffde0, 0xfffffe02 };
+ VECT_VAR_DECL(expected,uint,16,4) [] = { 0xbbc0, 0xc004, 0xc448, 0xc88c };
+ VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffface0, 0xffffb212 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xddb3, 0xdd58, 0xdcfd, 0xdca1 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc3b66666, 0xc3ab0000 };
+ VECT_VAR_DECL(expected,int,16,8) [] = { 0xffc0, 0xffc4, 0xffc8, 0xffcc,
+ 0xffd0, 0xffd4, 0xffd8, 0xffdc };
+@@ -16,6 +19,10 @@ VECT_VAR_DECL(expected,uint,16,8) [] = { 0xbbc0, 0xc004, 0xc448, 0xc88c,
+ 0xccd0, 0xd114, 0xd558, 0xd99c };
+ VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffface0, 0xffffb212,
+ 0xffffb744, 0xffffbc76 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xddb3, 0xdd58, 0xdcfd, 0xdca1,
++ 0xdc46, 0xdbd6, 0xdb20, 0xda69 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc3b66666, 0xc3ab0000,
+ 0xc39f9999, 0xc3943333 };
+
+@@ -45,11 +52,20 @@ void exec_vmul_lane (void)
+
+ DECL_VMUL(vector);
+ DECL_VMUL(vector_res);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
+
+ DECL_VARIABLE(vector2, int, 16, 4);
+ DECL_VARIABLE(vector2, int, 32, 2);
+ DECL_VARIABLE(vector2, uint, 16, 4);
+ DECL_VARIABLE(vector2, uint, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector2, float, 16, 4);
++#endif
+ DECL_VARIABLE(vector2, float, 32, 2);
+
+ clean_results ();
+@@ -59,11 +75,17 @@ void exec_vmul_lane (void)
+ VLOAD(vector, buffer, , int, s, 32, 2);
+ VLOAD(vector, buffer, , uint, u, 16, 4);
+ VLOAD(vector, buffer, , uint, u, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++#endif
+ VLOAD(vector, buffer, , float, f, 32, 2);
+ VLOAD(vector, buffer, q, int, s, 16, 8);
+ VLOAD(vector, buffer, q, int, s, 32, 4);
+ VLOAD(vector, buffer, q, uint, u, 16, 8);
+ VLOAD(vector, buffer, q, uint, u, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector, buffer, q, float, f, 32, 4);
+
+ /* Initialize vector2. */
+@@ -71,6 +93,9 @@ void exec_vmul_lane (void)
+ VDUP(vector2, , int, s, 32, 2, 0x22);
+ VDUP(vector2, , uint, u, 16, 4, 0x444);
+ VDUP(vector2, , uint, u, 32, 2, 0x532);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector2, , float, f, 16, 4, 22.8f);
++#endif
+ VDUP(vector2, , float, f, 32, 2, 22.8f);
+
+ /* Choose lane arbitrarily. */
+@@ -78,22 +103,34 @@ void exec_vmul_lane (void)
+ TEST_VMUL_LANE(, int, s, 32, 2, 2, 1);
+ TEST_VMUL_LANE(, uint, u, 16, 4, 4, 2);
+ TEST_VMUL_LANE(, uint, u, 32, 2, 2, 1);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VMUL_LANE(, float, f, 16, 4, 4, 1);
++#endif
+ TEST_VMUL_LANE(, float, f, 32, 2, 2, 1);
+ TEST_VMUL_LANE(q, int, s, 16, 8, 4, 2);
+ TEST_VMUL_LANE(q, int, s, 32, 4, 2, 0);
+ TEST_VMUL_LANE(q, uint, u, 16, 8, 4, 2);
+ TEST_VMUL_LANE(q, uint, u, 32, 4, 2, 1);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VMUL_LANE(q, float, f, 16, 8, 4, 0);
++#endif
+ TEST_VMUL_LANE(q, float, f, 32, 4, 2, 0);
+
+ CHECK(TEST_MSG, int, 16, 4, PRIx64, expected, "");
+ CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
+ CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
+ CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
+ CHECK(TEST_MSG, int, 16, 8, PRIx64, expected, "");
+ CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+ CHECK(TEST_MSG, uint, 16, 8, PRIx64, expected, "");
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
}
+
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane_f16_1.c
+@@ -0,0 +1,454 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+/* Recognize expressions of the form (X CMP 0) ? VAL : OP (X)
-+ where OP is CLZ or CTZ and VAL is the value from CLZ_DEFINED_VALUE_AT_ZERO
-+ or CTZ_DEFINED_VALUE_AT_ZERO respectively and return OP (X) if the expression
-+ can be simplified to that or NULL_RTX if not.
-+ Assume X is compared against zero with CMP_CODE and the true
-+ arm is TRUE_VAL and the false arm is FALSE_VAL. */
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+static rtx
-+simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
++#define FP16_C(a) ((__fp16) a)
++#define A FP16_C (13.4)
++#define B FP16_C (-56.8)
++#define C FP16_C (-34.8)
++#define D FP16_C (12)
++#define E FP16_C (63.1)
++#define F FP16_C (19.1)
++#define G FP16_C (-4.8)
++#define H FP16_C (77)
++
++#define I FP16_C (0.7)
++#define J FP16_C (-78)
++#define K FP16_C (11.23)
++#define L FP16_C (98)
++#define M FP16_C (87.1)
++#define N FP16_C (-8)
++#define O FP16_C (-1.1)
++#define P FP16_C (-9.7)
++
++/* Expected results for vmul_lane. */
++VECT_VAR_DECL (expected0_static, hfloat, 16, 4) []
++ = { 0x629B /* A * E. */,
++ 0xEB00 /* B * E. */,
++ 0xE84A /* C * E. */,
++ 0x61EA /* D * E. */ };
++
++VECT_VAR_DECL (expected1_static, hfloat, 16, 4) []
++ = { 0x5BFF /* A * F. */,
++ 0xE43D /* B * F. */,
++ 0xE131 /* C * F. */,
++ 0x5B29 /* D * F. */ };
++
++VECT_VAR_DECL (expected2_static, hfloat, 16, 4) []
++ = { 0xD405 /* A * G. */,
++ 0x5C43 /* B * G. */,
++ 0x5939 /* C * G. */,
++ 0xD334 /* D * G. */ };
++
++VECT_VAR_DECL (expected3_static, hfloat, 16, 4) []
++ = { 0x6408 /* A * H. */,
++ 0xEC46 /* B * H. */,
++ 0xE93C /* C * H. */,
++ 0x6338 /* D * H. */ };
++
++/* Expected results for vmulq_lane. */
++VECT_VAR_DECL (expected0_static, hfloat, 16, 8) []
++ = { 0x629B /* A * E. */,
++ 0xEB00 /* B * E. */,
++ 0xE84A /* C * E. */,
++ 0x61EA /* D * E. */,
++ 0x5186 /* I * E. */,
++ 0xECCE /* J * E. */,
++ 0x6189 /* K * E. */,
++ 0x6E0A /* L * E. */ };
++
++VECT_VAR_DECL (expected1_static, hfloat, 16, 8) []
++ = { 0x5BFF /* A * F. */,
++ 0xE43D /* B * F. */,
++ 0xE131 /* C * F. */,
++ 0x5B29 /* D * F. */,
++ 0x4AAF /* I * F. */,
++ 0xE5D1 /* J * F. */,
++ 0x5AB3 /* K * F. */,
++ 0x674F /* L * F. */ };
++
++VECT_VAR_DECL (expected2_static, hfloat, 16, 8) []
++ = { 0xD405 /* A * G. */,
++ 0x5C43 /* B * G. */,
++ 0x5939 /* C * G. */,
++ 0xD334 /* D * G. */,
++ 0xC2B9 /* I * G. */,
++ 0x5DDA /* J * G. */,
++ 0xD2BD /* K * G. */,
++ 0xDF5A /* L * G. */ };
++
++VECT_VAR_DECL (expected3_static, hfloat, 16, 8) []
++ = { 0x6408 /* A * H. */,
++ 0xEC46 /* B * H. */,
++ 0xE93C /* C * H. */,
++ 0x6338 /* D * H. */,
++ 0x52BD /* I * H. */,
++ 0xEDDE /* J * H. */,
++ 0x62C1 /* K * H. */,
++ 0x6F5E /* L * H. */ };
++
++/* Expected results for vmul_laneq. */
++VECT_VAR_DECL (expected_laneq0_static, hfloat, 16, 4) []
++ = { 0x629B /* A * E. */,
++ 0xEB00 /* B * E. */,
++ 0xE84A /* C * E. */,
++ 0x61EA /* D * E. */ };
++
++VECT_VAR_DECL (expected_laneq1_static, hfloat, 16, 4) []
++ = { 0x5BFF /* A * F. */,
++ 0xE43D /* B * F. */,
++ 0xE131 /* C * F. */,
++ 0x5B29 /* D * F. */ };
++
++VECT_VAR_DECL (expected_laneq2_static, hfloat, 16, 4) []
++ = { 0xD405 /* A * G. */,
++ 0x5C43 /* B * G. */,
++ 0x5939 /* C * G. */,
++ 0xD334 /* D * G. */ };
++
++VECT_VAR_DECL (expected_laneq3_static, hfloat, 16, 4) []
++ = { 0x6408 /* A * H. */,
++ 0xEC46 /* B * H. */,
++ 0xE93C /* C * H. */,
++ 0x6338 /* D * H. */ };
++
++VECT_VAR_DECL (expected_laneq4_static, hfloat, 16, 4) []
++ = { 0x648F /* A * M. */,
++ 0xECD5 /* B * M. */,
++ 0xE9ED /* C * M. */,
++ 0x6416 /* D * M. */ };
++
++VECT_VAR_DECL (expected_laneq5_static, hfloat, 16, 4) []
++ = { 0xD6B3 /* A * N. */,
++ 0x5F1A /* B * N. */,
++ 0x5C5A /* C * N. */,
++ 0xD600 /* D * N. */ };
++
++VECT_VAR_DECL (expected_laneq6_static, hfloat, 16, 4) []
++ = { 0xCB5E /* A * O. */,
++ 0x53CF /* B * O. */,
++ 0x50C9 /* C * O. */,
++ 0xCA99 /* D * O. */ };
++
++VECT_VAR_DECL (expected_laneq7_static, hfloat, 16, 4) []
++ = { 0xD810 /* A * P. */,
++ 0x604F /* B * P. */,
++ 0x5D47 /* C * P. */,
++ 0xD747 /* D * P. */ };
++
++/* Expected results for vmulq_laneq. */
++VECT_VAR_DECL (expected_laneq0_static, hfloat, 16, 8) []
++ = { 0x629B /* A * E. */,
++ 0xEB00 /* B * E. */,
++ 0xE84A /* C * E. */,
++ 0x61EA /* D * E. */,
++ 0x5186 /* I * E. */,
++ 0xECCE /* J * E. */,
++ 0x6189 /* K * E. */,
++ 0x6E0A /* L * E. */ };
++
++VECT_VAR_DECL (expected_laneq1_static, hfloat, 16, 8) []
++ = { 0x5BFF /* A * F. */,
++ 0xE43D /* B * F. */,
++ 0xE131 /* C * F. */,
++ 0x5B29 /* D * F. */,
++ 0x4AAF /* I * F. */,
++ 0xE5D1 /* J * F. */,
++ 0x5AB3 /* K * F. */,
++ 0x674F /* L * F. */ };
++
++VECT_VAR_DECL (expected_laneq2_static, hfloat, 16, 8) []
++ = { 0xD405 /* A * G. */,
++ 0x5C43 /* B * G. */,
++ 0x5939 /* C * G. */,
++ 0xD334 /* D * G. */,
++ 0xC2B9 /* I * G. */,
++ 0x5DDA /* J * G. */,
++ 0xD2BD /* K * G. */,
++ 0xDF5A /* L * G. */ };
++
++VECT_VAR_DECL (expected_laneq3_static, hfloat, 16, 8) []
++ = { 0x6408 /* A * H. */,
++ 0xEC46 /* B * H. */,
++ 0xE93C /* C * H. */,
++ 0x6338 /* D * H. */,
++ 0x52BD /* I * H. */,
++ 0xEDDE /* J * H. */,
++ 0x62C1 /* K * H. */,
++ 0x6F5E /* L * H. */ };
++
++VECT_VAR_DECL (expected_laneq4_static, hfloat, 16, 8) []
++ = { 0x648F /* A * M. */,
++ 0xECD5 /* B * M. */,
++ 0xE9ED /* C * M. */,
++ 0x6416 /* D * M. */,
++ 0x53A0 /* I * M. */,
++ 0xEEA3 /* J * M. */,
++ 0x63A4 /* K * M. */,
++ 0x702B /* L * M. */ };
++
++VECT_VAR_DECL (expected_laneq5_static, hfloat, 16, 8) []
++ = { 0xD6B3 /* A * N. */,
++ 0x5F1A /* B * N. */,
++ 0x5C5A /* C * N. */,
++ 0xD600 /* D * N. */,
++ 0xC59A /* I * N. */,
++ 0x60E0 /* J * N. */,
++ 0xD59D /* K * N. */,
++ 0xE220 /* L * N. */ };
++
++VECT_VAR_DECL (expected_laneq6_static, hfloat, 16, 8) []
++ = { 0xCB5E /* A * O. */,
++ 0x53CF /* B * O. */,
++ 0x50C9 /* C * O. */,
++ 0xCA99 /* D * O. */,
++ 0xBA29 /* I * O. */,
++ 0x555C /* J * O. */,
++ 0xCA2C /* K * O. */,
++ 0xD6BC /* L * O. */ };
++
++VECT_VAR_DECL (expected_laneq7_static, hfloat, 16, 8) []
++ = { 0xD810 /* A * P. */,
++ 0x604F /* B * P. */,
++ 0x5D47 /* C * P. */,
++ 0xD747 /* D * P. */,
++ 0xC6CB /* I * P. */,
++ 0x61EA /* J * P. */,
++ 0xD6CF /* K * P. */,
++ 0xE36E /* L * P. */ };
++
++void exec_vmul_lane_f16 (void)
+{
-+ if (cmp_code != EQ && cmp_code != NE)
-+ return NULL_RTX;
++#undef TEST_MSG
++#define TEST_MSG "VMUL_LANE (FP16)"
++ clean_results ();
+
-+ /* Result on X == 0 and X !=0 respectively. */
-+ rtx on_zero, on_nonzero;
-+ if (cmp_code == EQ)
-+ {
-+ on_zero = true_val;
-+ on_nonzero = false_val;
-+ }
-+ else
-+ {
-+ on_zero = false_val;
-+ on_nonzero = true_val;
-+ }
++ DECL_VARIABLE(vsrc_1, float, 16, 4);
++ DECL_VARIABLE(vsrc_2, float, 16, 4);
++ VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
++ VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
++ VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
++ VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
++ DECL_VARIABLE (vector_res, float, 16, 4)
++ = vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 0);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
+
-+ rtx_code op_code = GET_CODE (on_nonzero);
-+ if ((op_code != CLZ && op_code != CTZ)
-+ || !rtx_equal_p (XEXP (on_nonzero, 0), x)
-+ || !CONST_INT_P (on_zero))
-+ return NULL_RTX;
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_static, "");
+
-+ HOST_WIDE_INT op_val;
-+ if (((op_code == CLZ
-+ && CLZ_DEFINED_VALUE_AT_ZERO (GET_MODE (on_nonzero), op_val))
-+ || (op_code == CTZ
-+ && CTZ_DEFINED_VALUE_AT_ZERO (GET_MODE (on_nonzero), op_val)))
-+ && op_val == INTVAL (on_zero))
-+ return on_nonzero;
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 1);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
+
-+ return NULL_RTX;
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 2);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 3);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VMULQ_LANE (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_1, float, 16, 8);
++ VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
++ VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
++ DECL_VARIABLE (vector_res, float, 16, 8)
++ = vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 0);
++
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 1);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 2);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 3);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VMUL_LANEQ (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_2, float, 16, 8);
++ VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
++ VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 0);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 1);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 2);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 3);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq3_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 4);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq4_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 5);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq5_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 6);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq6_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmul_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 7);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq7_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VMULQ_LANEQ (FP16)"
++ clean_results ();
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 0);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 1);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 2);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 3);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq3_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 4);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq4_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 5);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq5_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 6);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq6_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 7);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq7_static, "");
+}
+
-
- /* Simplify CODE, an operation with result mode MODE and three operands,
- OP0, OP1, and OP2. OP0_MODE was the mode of OP0 before it became
-@@ -5400,6 +5444,19 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
- }
- }
++int
++main (void)
++{
++ exec_vmul_lane_f16 ();
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c
+@@ -7,6 +7,9 @@ VECT_VAR_DECL(expected,int,16,4) [] = { 0xfef0, 0xff01, 0xff12, 0xff23 };
+ VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffde0, 0xfffffe02 };
+ VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfcd0, 0xfd03, 0xfd36, 0xfd69 };
+ VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffbc0, 0xfffffc04 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xdd93, 0xdd3a, 0xdce1, 0xdc87 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc3b26666, 0xc3a74000 };
+ VECT_VAR_DECL(expected,int,16,8) [] = { 0xfab0, 0xfb05, 0xfb5a, 0xfbaf,
+ 0xfc04, 0xfc59, 0xfcae, 0xfd03 };
+@@ -16,6 +19,10 @@ VECT_VAR_DECL(expected,uint,16,8) [] = { 0xf890, 0xf907, 0xf97e, 0xf9f5,
+ 0xfa6c, 0xfae3, 0xfb5a, 0xfbd1 };
+ VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffff780, 0xfffff808,
+ 0xfffff890, 0xfffff918 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xe58e, 0xe535, 0xe4dc, 0xe483,
++ 0xe42a, 0xe3a3, 0xe2f2, 0xe240 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc4b1cccd, 0xc4a6b000,
+ 0xc49b9333, 0xc4907667 };
-+ /* Convert x == 0 ? N : clz (x) into clz (x) when
-+ CLZ_DEFINED_VALUE_AT_ZERO is defined to N for the mode of x.
-+ Similarly for ctz (x). */
-+ if (COMPARISON_P (op0) && !side_effects_p (op0)
-+ && XEXP (op0, 1) == const0_rtx)
-+ {
-+ rtx simplified
-+ = simplify_cond_clz_ctz (XEXP (op0, 0), GET_CODE (op0),
-+ op1, op2);
-+ if (simplified)
-+ return simplified;
-+ }
-+
- if (COMPARISON_P (op0) && ! side_effects_p (op0))
- {
- machine_mode cmp_mode = (GET_MODE (XEXP (op0, 0)) == VOIDmode
---- a/src/gcc/testsuite/g++.dg/lto/pr69589_0.C
-+++ b/src/gcc/testsuite/g++.dg/lto/pr69589_0.C
-@@ -1,6 +1,8 @@
- // { dg-lto-do link }
--// { dg-lto-options "-O2 -rdynamic" }
-+// { dg-lto-options "-O2 -rdynamic" }
- // { dg-extra-ld-options "-r -nostdlib" }
-+// { dg-skip-if "Skip targets without -rdynamic support" { arm*-none-eabi aarch64*-*-elf } { "*" } { "" } }
+@@ -50,6 +57,13 @@ void FNNAME (INSN_NAME) (void)
+ DECL_VMUL(vector);
+ DECL_VMUL(vector_res);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
+
- #pragma GCC visibility push(hidden)
- struct A { int &operator[] (long); };
- template <typename> struct B;
+ clean_results ();
+
+ /* Initialize vector from pre-initialized values. */
+@@ -57,11 +71,17 @@ void FNNAME (INSN_NAME) (void)
+ VLOAD(vector, buffer, , int, s, 32, 2);
+ VLOAD(vector, buffer, , uint, u, 16, 4);
+ VLOAD(vector, buffer, , uint, u, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++#endif
+ VLOAD(vector, buffer, , float, f, 32, 2);
+ VLOAD(vector, buffer, q, int, s, 16, 8);
+ VLOAD(vector, buffer, q, int, s, 32, 4);
+ VLOAD(vector, buffer, q, uint, u, 16, 8);
+ VLOAD(vector, buffer, q, uint, u, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector, buffer, q, float, f, 32, 4);
+
+ /* Choose multiplier arbitrarily. */
+@@ -69,22 +89,34 @@ void FNNAME (INSN_NAME) (void)
+ TEST_VMUL_N(, int, s, 32, 2, 0x22);
+ TEST_VMUL_N(, uint, u, 16, 4, 0x33);
+ TEST_VMUL_N(, uint, u, 32, 2, 0x44);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VMUL_N(, float, f, 16, 4, 22.3f);
++#endif
+ TEST_VMUL_N(, float, f, 32, 2, 22.3f);
+ TEST_VMUL_N(q, int, s, 16, 8, 0x55);
+ TEST_VMUL_N(q, int, s, 32, 4, 0x66);
+ TEST_VMUL_N(q, uint, u, 16, 8, 0x77);
+ TEST_VMUL_N(q, uint, u, 32, 4, 0x88);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VMUL_N(q, float, f, 16, 8, 88.9f);
++#endif
+ TEST_VMUL_N(q, float, f, 32, 4, 88.9f);
+
+ CHECK(TEST_MSG, int, 16, 4, PRIx64, expected, "");
+ CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
+ CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
+ CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
+ CHECK(TEST_MSG, int, 16, 8, PRIx64, expected, "");
+ CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+ CHECK(TEST_MSG, uint, 16, 8, PRIx64, expected, "");
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, "");
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
+ }
+
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.c-torture/compile/pr71295.c
-@@ -0,0 +1,12 @@
-+extern void fn2 (long long);
-+int a;
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c
+@@ -0,0 +1,42 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++#define INFF __builtin_inf ()
+
-+void
-+fn1 ()
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
+{
-+ long long b[3];
-+ a = 0;
-+ for (; a < 3; a++)
-+ b[a] = 1;
-+ fn2 (b[1]);
-+}
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0xc854 /* -8.656250 */,
++ 0x5cd8 /* 310.000000 */,
++ 0x60b0 /* 600.000000 */,
++ 0xa019 /* -0.008003 */,
++ 0xbc9a /* -1.150391 */,
++ 0xc8cf /* -9.617188 */,
++ 0x51fd /* 47.906250 */,
++ 0x4634 /* 6.203125 */,
++ 0xc0d9 /* -2.423828 */,
++ 0x3c9a /* 1.150391 */,
++ 0xc79a /* -7.601562 */,
++ 0x5430 /* 67.000000 */,
++ 0xbfd0 /* -1.953125 */,
++ 0x46ac /* 6.671875 */,
++ 0xfc00 /* -inf */,
++ 0xfc00 /* -inf */
++};
++
++#define TEST_MSG "VMULH_F16"
++#define INSN_NAME vmulh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.c-torture/execute/pr37780.c
-@@ -0,0 +1,49 @@
-+/* PR middle-end/37780. */
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulh_lane_f16_1.c
+@@ -0,0 +1,90 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+#define VAL (8 * sizeof (int))
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+int __attribute__ ((noinline, noclone))
-+fooctz (int i)
++#define FP16_C(a) ((__fp16) a)
++#define A FP16_C (13.4)
++#define B FP16_C (-56.8)
++#define C FP16_C (-34.8)
++#define D FP16_C (12)
++#define E FP16_C (63.1)
++#define F FP16_C (19.1)
++#define G FP16_C (-4.8)
++#define H FP16_C (77)
++
++#define I FP16_C (0.7)
++#define J FP16_C (-78)
++#define K FP16_C (11.23)
++#define L FP16_C (98)
++#define M FP16_C (87.1)
++#define N FP16_C (-8)
++#define O FP16_C (-1.1)
++#define P FP16_C (-9.7)
++
++extern void abort ();
++
++float16_t src1[8] = { A, B, C, D, I, J, K, L };
++VECT_VAR_DECL (src2, float, 16, 4) [] = { E, F, G, H };
++VECT_VAR_DECL (src2, float, 16, 8) [] = { E, F, G, H, M, N, O, P };
++
++/* Expected results for vmulh_lane. */
++uint16_t expected[4] = { 0x629B /* A * E. */, 0xE43D /* B * F. */,
++ 0x5939 /* C * G. */, 0x6338 /* D * H. */ };
++
++
++/* Expected results for vmulh_lane. */
++uint16_t expected_laneq[8] = { 0x629B /* A * E. */,
++ 0xE43D /* B * F. */,
++ 0x5939 /* C * G. */,
++ 0x6338 /* D * H. */,
++ 0x53A0 /* I * M. */,
++ 0x60E0 /* J * N. */,
++ 0xCA2C /* K * O. */,
++ 0xE36E /* L * P. */ };
++
++void exec_vmulh_lane_f16 (void)
+{
-+ return (i == 0) ? VAL : __builtin_ctz (i);
++#define CHECK_LANE(N)\
++ ret = vmulh_lane_f16 (src1[N], VECT_VAR (vsrc2, float, 16, 4), N);\
++ if (*(uint16_t *) &ret != expected[N])\
++ abort ();
++
++ DECL_VARIABLE(vsrc2, float, 16, 4);
++ VLOAD (vsrc2, src2, , float, f, 16, 4);
++ float16_t ret;
++
++ CHECK_LANE(0)
++ CHECK_LANE(1)
++ CHECK_LANE(2)
++ CHECK_LANE(3)
++
++#undef CHECK_LANE
++#define CHECK_LANE(N)\
++ ret = vmulh_laneq_f16 (src1[N], VECT_VAR (vsrc2, float, 16, 8), N);\
++ if (*(uint16_t *) &ret != expected_laneq[N])\
++ abort ();
++
++ DECL_VARIABLE(vsrc2, float, 16, 8);
++ VLOAD (vsrc2, src2, q, float, f, 16, 8);
++
++ CHECK_LANE(0)
++ CHECK_LANE(1)
++ CHECK_LANE(2)
++ CHECK_LANE(3)
++ CHECK_LANE(4)
++ CHECK_LANE(5)
++ CHECK_LANE(6)
++ CHECK_LANE(7)
+}
+
-+int __attribute__ ((noinline, noclone))
-+fooctz2 (int i)
++int
++main (void)
+{
-+ return (i != 0) ? __builtin_ctz (i) : VAL;
++ exec_vmulh_lane_f16 ();
++ return 0;
+}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_f16_1.c
+@@ -0,0 +1,84 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+unsigned int __attribute__ ((noinline, noclone))
-+fooctz3 (unsigned int i)
-+{
-+ return (i > 0) ? __builtin_ctz (i) : VAL;
-+}
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+int __attribute__ ((noinline, noclone))
-+fooclz (int i)
++#define FP16_C(a) ((__fp16) a)
++#define A FP16_C (13.4)
++#define B FP16_C (__builtin_inff ())
++#define C FP16_C (-34.8)
++#define D FP16_C (-__builtin_inff ())
++#define E FP16_C (63.1)
++#define F FP16_C (0.0)
++#define G FP16_C (-4.8)
++#define H FP16_C (0.0)
++
++#define I FP16_C (0.7)
++#define J FP16_C (-__builtin_inff ())
++#define K FP16_C (11.23)
++#define L FP16_C (98)
++#define M FP16_C (87.1)
++#define N FP16_C (-0.0)
++#define O FP16_C (-1.1)
++#define P FP16_C (7)
++
++/* Expected results for vmulx. */
++VECT_VAR_DECL (expected_static, hfloat, 16, 4) []
++ = { 0x629B /* A * E. */, 0x4000 /* FP16_C (2.0f). */,
++ 0x5939 /* C * G. */, 0xC000 /* FP16_C (-2.0f). */ };
++
++VECT_VAR_DECL (expected_static, hfloat, 16, 8) []
++ = { 0x629B /* A * E. */, 0x4000 /* FP16_C (2.0f). */,
++ 0x5939 /* C * G. */, 0xC000 /* FP16_C (-2.0f). */,
++ 0x53A0 /* I * M. */, 0x4000 /* FP16_C (2.0f). */,
++ 0xCA2C /* K * O. */, 0x615C /* L * P. */ };
++
++void exec_vmulx_f16 (void)
+{
-+ return (i == 0) ? VAL : __builtin_clz (i);
++#undef TEST_MSG
++#define TEST_MSG "VMULX (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_1, float, 16, 4);
++ DECL_VARIABLE(vsrc_2, float, 16, 4);
++ VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
++ VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
++ VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
++ VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
++ DECL_VARIABLE (vector_res, float, 16, 4)
++ = vmulx_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4));
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VMULXQ (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_1, float, 16, 8);
++ DECL_VARIABLE(vsrc_2, float, 16, 8);
++ VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
++ VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
++ VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
++ VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
++ DECL_VARIABLE (vector_res, float, 16, 8)
++ = vmulxq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8));
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
+}
+
-+int __attribute__ ((noinline, noclone))
-+fooclz2 (int i)
++int
++main (void)
+{
-+ return (i != 0) ? __builtin_clz (i) : VAL;
++ exec_vmulx_f16 ();
++ return 0;
+}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f16_1.c
+@@ -0,0 +1,452 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+unsigned int __attribute__ ((noinline, noclone))
-+fooclz3 (unsigned int i)
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++#define FP16_C(a) ((__fp16) a)
++#define A FP16_C (13.4)
++#define B FP16_C (__builtin_inff ())
++#define C FP16_C (-34.8)
++#define D FP16_C (-__builtin_inff ())
++#define E FP16_C (-0.0)
++#define F FP16_C (19.1)
++#define G FP16_C (-4.8)
++#define H FP16_C (0.0)
++
++#define I FP16_C (0.7)
++#define J FP16_C (-78)
++#define K FP16_C (-__builtin_inff ())
++#define L FP16_C (98)
++#define M FP16_C (87.1)
++#define N FP16_C (-8)
++#define O FP16_C (-1.1)
++#define P FP16_C (-0.0)
++
++/* Expected results for vmulx_lane. */
++VECT_VAR_DECL (expected0_static, hfloat, 16, 4) []
++ = { 0x8000 /* A * E. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* C * E. */,
++ 0x4000 /* FP16_C (2.0f). */ };
++
++VECT_VAR_DECL (expected1_static, hfloat, 16, 4) []
++ = { 0x5BFF /* A * F. */,
++ 0x7C00 /* B * F. */,
++ 0xE131 /* C * F. */,
++ 0xFC00 /* D * F. */ };
++
++VECT_VAR_DECL (expected2_static, hfloat, 16, 4) []
++ = { 0xD405 /* A * G. */,
++ 0xFC00 /* B * G. */,
++ 0x5939 /* C * G. */,
++ 0x7C00 /* D * G. */ };
++
++VECT_VAR_DECL (expected3_static, hfloat, 16, 4) []
++ = { 0x0000 /* A * H. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* C * H. */,
++ 0xC000 /* FP16_C (-2.0f). */ };
++
++/* Expected results for vmulxq_lane. */
++VECT_VAR_DECL (expected0_static, hfloat, 16, 8) []
++ = { 0x8000 /* A * E. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* C * E. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* I * E. */,
++ 0x0000 /* J * E. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* L * E. */ };
++
++VECT_VAR_DECL (expected1_static, hfloat, 16, 8) []
++ = { 0x5BFF /* A * F. */,
++ 0x7C00 /* B * F. */,
++ 0xE131 /* C * F. */,
++ 0xFC00 /* D * F. */,
++ 0x4AAF /* I * F. */,
++ 0xE5D1 /* J * F. */,
++ 0xFC00 /* K * F. */,
++ 0x674F /* L * F. */ };
++
++VECT_VAR_DECL (expected2_static, hfloat, 16, 8) []
++ = { 0xD405 /* A * G. */,
++ 0xFC00 /* B * G. */,
++ 0x5939 /* C * G. */,
++ 0x7C00 /* D * G. */,
++ 0xC2B9 /* I * G. */,
++ 0x5DDA /* J * G. */,
++ 0x7C00 /* K * G. */,
++ 0xDF5A /* L * G. */ };
++
++VECT_VAR_DECL (expected3_static, hfloat, 16, 8) []
++ = { 0x0000 /* A * H. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* C * H. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* I * H. */,
++ 0x8000 /* J * H. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* L * H. */};
++
++/* Expected results for vmulx_laneq. */
++VECT_VAR_DECL (expected_laneq0_static, hfloat, 16, 4) []
++ = { 0x8000 /* A * E. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* C * E. */,
++ 0x4000 /* FP16_C (2.0f). */ };
++
++VECT_VAR_DECL (expected_laneq1_static, hfloat, 16, 4) []
++ = { 0x5BFF /* A * F. */,
++ 0x7C00 /* B * F. */,
++ 0xE131 /* C * F. */,
++ 0xFC00 /* D * F. */ };
++
++VECT_VAR_DECL (expected_laneq2_static, hfloat, 16, 4) []
++ = { 0xD405 /* A * G. */,
++ 0xFC00 /* B * G. */,
++ 0x5939 /* C * G. */,
++ 0x7C00 /* D * G. */ };
++
++VECT_VAR_DECL (expected_laneq3_static, hfloat, 16, 4) []
++ = { 0x0000 /* A * H. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* C * H. */,
++ 0xC000 /* FP16_C (-2.0f). */ };
++
++VECT_VAR_DECL (expected_laneq4_static, hfloat, 16, 4) []
++ = { 0x648F /* A * M. */,
++ 0x7C00 /* B * M. */,
++ 0xE9ED /* C * M. */,
++ 0xFC00 /* D * M. */ };
++
++VECT_VAR_DECL (expected_laneq5_static, hfloat, 16, 4) []
++ = { 0xD6B3 /* A * N. */,
++ 0xFC00 /* B * N. */,
++ 0x5C5A /* C * N. */,
++ 0x7C00 /* D * N. */ };
++
++VECT_VAR_DECL (expected_laneq6_static, hfloat, 16, 4) []
++ = { 0xCB5E /* A * O. */,
++ 0xFC00 /* B * O. */,
++ 0x50C9 /* C * O. */,
++ 0x7C00 /* D * O. */ };
++
++VECT_VAR_DECL (expected_laneq7_static, hfloat, 16, 4) []
++ = { 0x8000 /* A * P. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* C * P. */,
++ 0x4000 /* FP16_C (2.0f). */ };
++
++VECT_VAR_DECL (expected_laneq0_static, hfloat, 16, 8) []
++ = { 0x8000 /* A * E. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* C * E. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* I * E. */,
++ 0x0000 /* J * E. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* L * E. */ };
++
++VECT_VAR_DECL (expected_laneq1_static, hfloat, 16, 8) []
++ = { 0x5BFF /* A * F. */,
++ 0x7C00 /* B * F. */,
++ 0xE131 /* C * F. */,
++ 0xFC00 /* D * F. */,
++ 0x4AAF /* I * F. */,
++ 0xE5D1 /* J * F. */,
++ 0xFC00 /* K * F. */,
++ 0x674F /* L * F. */ };
++
++VECT_VAR_DECL (expected_laneq2_static, hfloat, 16, 8) []
++ = { 0xD405 /* A * G. */,
++ 0xFC00 /* B * G. */,
++ 0x5939 /* C * G. */,
++ 0x7C00 /* D * G. */,
++ 0xC2B9 /* I * G. */,
++ 0x5DDA /* J * G. */,
++ 0x7C00 /* K * G. */,
++ 0xDF5A /* L * G. */ };
++
++VECT_VAR_DECL (expected_laneq3_static, hfloat, 16, 8) []
++ = { 0x0000 /* A * H. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* C * H. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* I * H. */,
++ 0x8000 /* J * H. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* L * H. */ };
++
++VECT_VAR_DECL (expected_laneq4_static, hfloat, 16, 8) []
++ = { 0x648F /* A * M. */,
++ 0x7C00 /* B * M. */,
++ 0xE9ED /* C * M. */,
++ 0xFC00 /* D * M. */,
++ 0x53A0 /* I * M. */,
++ 0xEEA3 /* J * M. */,
++ 0xFC00 /* K * M. */,
++ 0x702B /* L * M. */ };
++
++VECT_VAR_DECL (expected_laneq5_static, hfloat, 16, 8) []
++ = { 0xD6B3 /* A * N. */,
++ 0xFC00 /* B * N. */,
++ 0x5C5A /* C * N. */,
++ 0x7C00 /* D * N. */,
++ 0xC59A /* I * N. */,
++ 0x60E0 /* J * N. */,
++ 0x7C00 /* K * N. */,
++ 0xE220 /* L * N. */ };
++
++VECT_VAR_DECL (expected_laneq6_static, hfloat, 16, 8) []
++ = { 0xCB5E /* A * O. */,
++ 0xFC00 /* B * O. */,
++ 0x50C9 /* C * O. */,
++ 0x7C00 /* D * O. */,
++ 0xBA29 /* I * O. */,
++ 0x555C /* J * O. */,
++ 0x7C00 /* K * O. */,
++ 0xD6BC /* L * O. */ };
++
++VECT_VAR_DECL (expected_laneq7_static, hfloat, 16, 8) []
++ = { 0x8000 /* A * P. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* C * P. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* I * P. */,
++ 0x0000 /* J * P. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* L * P. */ };
++
++void exec_vmulx_lane_f16 (void)
+{
-+ return (i > 0) ? __builtin_clz (i) : VAL;
++#undef TEST_MSG
++#define TEST_MSG "VMULX_LANE (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_1, float, 16, 4);
++ DECL_VARIABLE(vsrc_2, float, 16, 4);
++ VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
++ VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
++ VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
++ VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
++ DECL_VARIABLE (vector_res, float, 16, 4)
++ = vmulx_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 0);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 1);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 2);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_lane_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4), 3);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VMULXQ_LANE (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_1, float, 16, 8);
++ VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
++ VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
++ DECL_VARIABLE (vector_res, float, 16, 8)
++ = vmulxq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 0);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 1);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 2);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_lane_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 4), 3);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VMULX_LANEQ (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_2, float, 16, 8);
++ VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
++ VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 0);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 1);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 2);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 3);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq3_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 4);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq4_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 5);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq5_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 6);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq6_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 8), 7);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_laneq7_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VMULXQ_LANEQ (FP16)"
++ clean_results ();
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 0);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 1);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 2);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 3);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq3_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 4);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq4_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 5);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq5_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 6);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq6_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_laneq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8), 7);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_laneq7_static, "");
+}
+
+int
+main (void)
+{
-+ if (fooctz (0) != VAL || fooctz2 (0) != VAL || fooctz3 (0) != VAL
-+ || fooclz (0) != VAL || fooclz2 (0) != VAL || fooclz3 (0) != VAL)
-+ __builtin_abort ();
-+
++ exec_vmulx_lane_f16 ();
+ return 0;
+}
-\ No newline at end of file
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.c-torture/execute/pr66940.c
-@@ -0,0 +1,20 @@
-+long long __attribute__ ((noinline, noclone))
-+foo (long long ival)
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_n_f16_1.c
+@@ -0,0 +1,177 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++#define FP16_C(a) ((__fp16) a)
++#define A FP16_C (13.4)
++#define B FP16_C (__builtin_inff ())
++#define C FP16_C (-34.8)
++#define D FP16_C (-__builtin_inff ())
++#define E FP16_C (-0.0)
++#define F FP16_C (19.1)
++#define G FP16_C (-4.8)
++#define H FP16_C (0.0)
++
++float16_t elemE = E;
++float16_t elemF = F;
++float16_t elemG = G;
++float16_t elemH = H;
++
++#define I FP16_C (0.7)
++#define J FP16_C (-78)
++#define K FP16_C (11.23)
++#define L FP16_C (98)
++#define M FP16_C (87.1)
++#define N FP16_C (-8)
++#define O FP16_C (-1.1)
++#define P FP16_C (-9.7)
++
++/* Expected results for vmulx_n. */
++VECT_VAR_DECL (expected0_static, hfloat, 16, 4) []
++ = { 0x8000 /* A * E. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* C * E. */,
++ 0x4000 /* FP16_C (2.0f). */ };
++
++VECT_VAR_DECL (expected1_static, hfloat, 16, 4) []
++ = { 0x5BFF /* A * F. */,
++ 0x7C00 /* B * F. */,
++ 0xE131 /* C * F. */,
++ 0xFC00 /* D * F. */ };
++
++VECT_VAR_DECL (expected2_static, hfloat, 16, 4) []
++ = { 0xD405 /* A * G. */,
++ 0xFC00 /* B * G. */,
++ 0x5939 /* C * G. */,
++ 0x7C00 /* D * G. */ };
++
++VECT_VAR_DECL (expected3_static, hfloat, 16, 4) []
++ = { 0x0000 /* A * H. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* C * H. */,
++ 0xC000 /* FP16_C (-2.0f). */ };
++
++VECT_VAR_DECL (expected0_static, hfloat, 16, 8) []
++ = { 0x8000 /* A * E. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* C * E. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* I * E. */,
++ 0x0000 /* J * E. */,
++ 0x8000 /* K * E. */,
++ 0x8000 /* L * E. */ };
++
++VECT_VAR_DECL (expected1_static, hfloat, 16, 8) []
++ = { 0x5BFF /* A * F. */,
++ 0x7C00 /* B * F. */,
++ 0xE131 /* C * F. */,
++ 0xFC00 /* D * F. */,
++ 0x4AAF /* I * F. */,
++ 0xE5D1 /* J * F. */,
++ 0x5AB3 /* K * F. */,
++ 0x674F /* L * F. */ };
++
++VECT_VAR_DECL (expected2_static, hfloat, 16, 8) []
++ = { 0xD405 /* A * G. */,
++ 0xFC00 /* B * G. */,
++ 0x5939 /* C * G. */,
++ 0x7C00 /* D * G. */,
++ 0xC2B9 /* I * G. */,
++ 0x5DDA /* J * G. */,
++ 0xD2BD /* K * G. */,
++ 0xDF5A /* L * G. */ };
++
++VECT_VAR_DECL (expected3_static, hfloat, 16, 8) []
++ = { 0x0000 /* A * H. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x8000 /* C * H. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x0000 /* I * H. */,
++ 0x8000 /* J * H. */,
++ 0x0000 /* K * H. */,
++ 0x0000 /* L * H. */ };
++
++void exec_vmulx_n_f16 (void)
+{
-+ if (ival <= 0)
-+ return -0x7fffffffffffffffL - 1;
++#undef TEST_MSG
++#define TEST_MSG "VMULX_N (FP16)"
++ clean_results ();
+
-+ return 0x7fffffffffffffffL;
++ DECL_VARIABLE (vsrc_1, float, 16, 4);
++ VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
++ VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
++ DECL_VARIABLE (vector_res, float, 16, 4)
++ = vmulx_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), elemE);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), elemF);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), elemG);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 4)
++ = vmulx_n_f16 (VECT_VAR (vsrc_1, float, 16, 4), elemH);
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected3_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VMULXQ_N (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE (vsrc_1, float, 16, 8);
++ VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
++ VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
++ DECL_VARIABLE (vector_res, float, 16, 8)
++ = vmulxq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), elemE);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected0_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), elemF);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected1_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), elemG);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected2_static, "");
++
++ VECT_VAR (vector_res, float, 16, 8)
++ = vmulxq_n_f16 (VECT_VAR (vsrc_1, float, 16, 8), elemH);
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected3_static, "");
+}
+
+int
+main (void)
+{
-+ if (foo (-1) != (-0x7fffffffffffffffL - 1))
-+ __builtin_abort ();
++ exec_vmulx_n_f16 ();
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulxh_f16_1.c
+@@ -0,0 +1,50 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++#define A 13.4
++#define B __builtin_inff ()
++#define C -34.8
++#define D -__builtin_inff ()
++#define E 63.1
++#define F 0.0
++#define G -4.8
++#define H 0.0
++
++#define I 0.7
++#define J -__builtin_inff ()
++#define K 11.23
++#define L 98
++#define M 87.1
++#define N -0.0
++#define O -1.1
++#define P 7
++
++float16_t input_1[] = { A, B, C, D, I, J, K, L };
++float16_t input_2[] = { E, F, G, H, M, N, O, P };
++uint16_t expected[] = { 0x629B /* A * E. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x5939 /* C * G. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x53A0 /* I * M. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0xCA2C /* K * O. */,
++ 0x615C /* L * P. */ };
++
++#define TEST_MSG "VMULXH_F16"
++#define INSN_NAME vmulxh_f16
++
++#define INPUT_1 input_1
++#define INPUT_2 input_2
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulxh_lane_f16_1.c
+@@ -0,0 +1,91 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+ if (foo (1) != 0x7fffffffffffffffL)
-+ __builtin_abort ();
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
++#define FP16_C(a) ((__fp16) a)
++#define A FP16_C (13.4)
++#define B FP16_C (__builtin_inff ())
++#define C FP16_C (-34.8)
++#define D FP16_C (-__builtin_inff ())
++#define E FP16_C (63.1)
++#define F FP16_C (0.0)
++#define G FP16_C (-4.8)
++#define H FP16_C (0.0)
++
++#define I FP16_C (0.7)
++#define J FP16_C (-__builtin_inff ())
++#define K FP16_C (11.23)
++#define L FP16_C (98)
++#define M FP16_C (87.1)
++#define N FP16_C (-0.0)
++#define O FP16_C (-1.1)
++#define P FP16_C (7)
++
++extern void abort ();
++
++float16_t src1[8] = { A, B, C, D, I, J, K, L };
++VECT_VAR_DECL (src2, float, 16, 4) [] = { E, F, G, H };
++VECT_VAR_DECL (src2, float, 16, 8) [] = { E, F, G, H, M, N, O, P };
++
++/* Expected results for vmulxh_lane. */
++uint16_t expected[4] = { 0x629B /* A * E. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x5939 /* C * G. */,
++ 0xC000 /* FP16_C (-2.0f). */ };
++
++/* Expected results for vmulxh_lane. */
++uint16_t expected_laneq[8] = { 0x629B /* A * E. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0x5939 /* C * G. */,
++ 0xC000 /* FP16_C (-2.0f). */,
++ 0x53A0 /* I * M. */,
++ 0x4000 /* FP16_C (2.0f). */,
++ 0xCA2C /* K * O. */,
++ 0x615C /* L * P. */ };
++
++void exec_vmulxh_lane_f16 (void)
++{
++#define CHECK_LANE(N)\
++ ret = vmulxh_lane_f16 (src1[N], VECT_VAR (vsrc2, float, 16, 4), N);\
++ if (*(uint16_t *) &ret != expected[N])\
++ abort ();
++
++ DECL_VARIABLE(vsrc2, float, 16, 4);
++ VLOAD (vsrc2, src2, , float, f, 16, 4);
++ float16_t ret;
++
++ CHECK_LANE(0)
++ CHECK_LANE(1)
++ CHECK_LANE(2)
++ CHECK_LANE(3)
++
++#undef CHECK_LANE
++#define CHECK_LANE(N)\
++ ret = vmulxh_laneq_f16 (src1[N], VECT_VAR (vsrc2, float, 16, 8), N);\
++ if (*(uint16_t *) &ret != expected_laneq[N])\
++ abort ();
++
++ DECL_VARIABLE(vsrc2, float, 16, 8);
++ VLOAD (vsrc2, src2, q, float, f, 16, 8);
++
++ CHECK_LANE(0)
++ CHECK_LANE(1)
++ CHECK_LANE(2)
++ CHECK_LANE(3)
++ CHECK_LANE(4)
++ CHECK_LANE(5)
++ CHECK_LANE(6)
++ CHECK_LANE(7)
++}
++
++int
++main (void)
++{
++ exec_vmulxh_lane_f16 ();
+ return 0;
+}
---- a/src/gcc/testsuite/gcc.dg/plugin/plugin.exp
-+++ b/src/gcc/testsuite/gcc.dg/plugin/plugin.exp
-@@ -87,6 +87,12 @@ foreach plugin_test $plugin_test_list {
- if ![runtest_file_p $runtests $plugin_src] then {
- continue
- }
-+ # Skip tail call tests on targets that do not have sibcall_epilogue.
-+ if {[regexp ".*must_tail_call_plugin.c" $plugin_src]
-+ && [istarget arm*-*-*]
-+ && [check_effective_target_arm_thumb1]} then {
-+ continue
-+ }
- set plugin_input_tests [lreplace $plugin_test 0 0]
- plugin-test-execute $plugin_src $plugin_input_tests
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vneg.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vneg.c
+@@ -21,24 +21,53 @@ VECT_VAR_DECL(expected,int,32,4) [] = { 0x10, 0xf, 0xe, 0xd };
+ /* Expected results for float32 variants. Needs to be separated since
+ the generic test function does not test floating-point
+ versions. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_float16, hfloat, 16, 4) [] = { 0xc09a, 0xc09a,
++ 0xc09a, 0xc09a };
++VECT_VAR_DECL(expected_float16, hfloat, 16, 8) [] = { 0xc2cd, 0xc2cd,
++ 0xc2cd, 0xc2cd,
++ 0xc2cd, 0xc2cd,
++ 0xc2cd, 0xc2cd };
++#endif
+ VECT_VAR_DECL(expected_float32,hfloat,32,2) [] = { 0xc0133333, 0xc0133333 };
+ VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0xc059999a, 0xc059999a,
+ 0xc059999a, 0xc059999a };
+
+ void exec_vneg_f32(void)
+ {
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector, float, 32, 2);
+ DECL_VARIABLE(vector, float, 32, 4);
+
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector_res, float, 32, 2);
+ DECL_VARIABLE(vector_res, float, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, 2.3f);
++ VDUP(vector, q, float, f, 16, 8, 3.4f);
++#endif
+ VDUP(vector, , float, f, 32, 2, 2.3f);
+ VDUP(vector, q, float, f, 32, 4, 3.4f);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_UNARY_OP(INSN_NAME, , float, f, 16, 4);
++ TEST_UNARY_OP(INSN_NAME, q, float, f, 16, 8);
++#endif
+ TEST_UNARY_OP(INSN_NAME, , float, f, 32, 2);
+ TEST_UNARY_OP(INSN_NAME, q, float, f, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_float16, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16, "");
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_float32, "");
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, "");
}
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.dg/tree-ssa/scev-11.c
-@@ -0,0 +1,28 @@
-+/* { dg-do compile } */
-+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c
+@@ -0,0 +1,39 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++uint16_t expected[] =
++{
++ 0x8000 /* -0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0xc000 /* -2.000000 */,
++ 0xc233 /* -3.099609 */,
++ 0xcd00 /* -20.000000 */,
++ 0xb666 /* -0.399902 */,
++ 0x409a /* 2.300781 */,
++ 0xbd52 /* -1.330078 */,
++ 0x479a /* 7.601562 */,
++ 0xb4f6 /* -0.310059 */,
++ 0xb55d /* -0.335205 */,
++ 0xb800 /* -0.500000 */,
++ 0xbc00 /* -1.000000 */,
++ 0xca91 /* -13.132812 */,
++ 0x464d /* 6.300781 */,
++ 0xcd00 /* -20.000000 */,
++ 0xfc00 /* -inf */,
++ 0x7c00 /* inf */
++};
++
++#define TEST_MSG "VNEGH_F16"
++#define INSN_NAME vnegh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc
+@@ -21,6 +21,9 @@ void FNNAME (INSN_NAME) (void)
+ DECL_VARIABLE(vector, uint, 8, 8);
+ DECL_VARIABLE(vector, uint, 16, 4);
+ DECL_VARIABLE(vector, uint, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++#endif
+ DECL_VARIABLE(vector, float, 32, 2);
+
+ DECL_VARIABLE(vector_res, int, 8, 8);
+@@ -29,6 +32,9 @@ void FNNAME (INSN_NAME) (void)
+ DECL_VARIABLE(vector_res, uint, 8, 8);
+ DECL_VARIABLE(vector_res, uint, 16, 4);
+ DECL_VARIABLE(vector_res, uint, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector_res, float, 16, 4);
++#endif
+ DECL_VARIABLE(vector_res, float, 32, 2);
+
+ clean_results ();
+@@ -40,6 +46,9 @@ void FNNAME (INSN_NAME) (void)
+ VLOAD(vector, buffer, , uint, u, 8, 8);
+ VLOAD(vector, buffer, , uint, u, 16, 4);
+ VLOAD(vector, buffer, , uint, u, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++#endif
+ VLOAD(vector, buffer, , float, f, 32, 2);
+
+ /* Apply a binary operator named INSN_NAME. */
+@@ -49,6 +58,9 @@ void FNNAME (INSN_NAME) (void)
+ TEST_VPXXX(INSN_NAME, uint, u, 8, 8);
+ TEST_VPXXX(INSN_NAME, uint, u, 16, 4);
+ TEST_VPXXX(INSN_NAME, uint, u, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VPXXX(INSN_NAME, float, f, 16, 4);
++#endif
+ TEST_VPXXX(INSN_NAME, float, f, 32, 2);
+
+ CHECK(TEST_MSG, int, 8, 8, PRIx32, expected, "");
+@@ -57,6 +69,9 @@ void FNNAME (INSN_NAME) (void)
+ CHECK(TEST_MSG, uint, 8, 8, PRIx32, expected, "");
+ CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
+ CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, "");
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
+ }
+
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c
+@@ -14,6 +14,9 @@ VECT_VAR_DECL(expected,uint,8,8) [] = { 0xe1, 0xe5, 0xe9, 0xed,
+ 0xe1, 0xe5, 0xe9, 0xed };
+ VECT_VAR_DECL(expected,uint,16,4) [] = { 0xffe1, 0xffe5, 0xffe1, 0xffe5 };
+ VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffffffe1, 0xffffffe1 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcfc0, 0xcec0, 0xcfc0, 0xcec0 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1f80000, 0xc1f80000 };
+
+ #include "vpXXX.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c
+@@ -15,6 +15,9 @@ VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
+ 0xf1, 0xf3, 0xf5, 0xf7 };
+ VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff1, 0xfff3, 0xfff1, 0xfff3 };
+ VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff1, 0xfffffff1 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcb80, 0xca80, 0xcb80, 0xca80 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1700000, 0xc1700000 };
+
+ #include "vpXXX.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmin.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmin.c
+@@ -15,6 +15,9 @@ VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
+ 0xf0, 0xf2, 0xf4, 0xf6 };
+ VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff2, 0xfff0, 0xfff2 };
+ VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff0 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb00, 0xcc00, 0xcb00 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0xc1800000 };
+
+ #include "vpXXX.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpminmaxnm_f16_1.c
+@@ -0,0 +1,114 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++#define FP16_C(a) ((__fp16) a)
++#define A FP16_C (123.4)
++#define B FP16_C (__builtin_nanf ("")) /* NaN */
++#define C FP16_C (-34.8)
++#define D FP16_C (1024)
++#define E FP16_C (663.1)
++#define F FP16_C (169.1)
++#define G FP16_C (-4.8)
++#define H FP16_C (-__builtin_nanf ("")) /* NaN */
++
++#define I FP16_C (0.7)
++#define J FP16_C (-78)
++#define K FP16_C (101.23)
++#define L FP16_C (-1098)
++#define M FP16_C (870.1)
++#define N FP16_C (-8781)
++#define O FP16_C (__builtin_inff ()) /* +Inf */
++#define P FP16_C (-__builtin_inff ()) /* -Inf */
++
++
++/* Expected results for vpminnm. */
++VECT_VAR_DECL (expected_min_static, hfloat, 16, 4) []
++ = { 0x57B6 /* A. */, 0xD05A /* C. */, 0x5949 /* F. */, 0xC4CD /* G. */ };
++
++VECT_VAR_DECL (expected_min_static, hfloat, 16, 8) []
++ = { 0x57B6 /* A. */, 0xD05A /* C. */, 0xD4E0 /* J. */, 0xE44A /* L. */,
++ 0x5949 /* F. */, 0xC4CD /* G. */, 0xF04A /* N. */, 0xFC00 /* P. */ };
++
++/* expected_max results for vpmaxnm. */
++VECT_VAR_DECL (expected_max_static, hfloat, 16, 4) []
++ = { 0x57B6 /* A. */, 0x6400 /* D. */, 0x612E /* E. */, 0xC4CD /* G. */ };
++
++VECT_VAR_DECL (expected_max_static, hfloat, 16, 8) []
++ = { 0x57B6 /* A. */, 0x6400 /* D. */, 0x399A /* I. */, 0x5654 /* K. */,
++ 0x612E /* E. */, 0xC4CD /* G. */, 0x62CC /* M. */, 0x7C00 /* O. */ };
++
++void exec_vpminmaxnm_f16 (void)
++{
++#undef TEST_MSG
++#define TEST_MSG "VPMINNM (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_1, float, 16, 4);
++ DECL_VARIABLE(vsrc_2, float, 16, 4);
++ VECT_VAR_DECL (buf_src_1, float, 16, 4) [] = {A, B, C, D};
++ VECT_VAR_DECL (buf_src_2, float, 16, 4) [] = {E, F, G, H};
++ VLOAD (vsrc_1, buf_src_1, , float, f, 16, 4);
++ VLOAD (vsrc_2, buf_src_2, , float, f, 16, 4);
++ DECL_VARIABLE (vector_res, float, 16, 4)
++ = vpminnm_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4));
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
++
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_min_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VPMINNMQ (FP16)"
++ clean_results ();
++
++ DECL_VARIABLE(vsrc_1, float, 16, 8);
++ DECL_VARIABLE(vsrc_2, float, 16, 8);
++ VECT_VAR_DECL (buf_src_1, float, 16, 8) [] = {A, B, C, D, I, J, K, L};
++ VECT_VAR_DECL (buf_src_2, float, 16, 8) [] = {E, F, G, H, M, N, O, P};
++ VLOAD (vsrc_1, buf_src_1, q, float, f, 16, 8);
++ VLOAD (vsrc_2, buf_src_2, q, float, f, 16, 8);
++ DECL_VARIABLE (vector_res, float, 16, 8)
++ = vpminnmq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8));
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
++
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_min_static, "");
++
++#undef TEST_MSG
++#define TEST_MSG "VPMAXNM (FP16)"
++ clean_results ();
+
-+int a[128];
-+extern int b[];
++ VECT_VAR (vector_res, float, 16, 4)
++ = vpmaxnm_f16 (VECT_VAR (vsrc_1, float, 16, 4),
++ VECT_VAR (vsrc_2, float, 16, 4));
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
+
-+int bar (int *);
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_max_static, "");
+
-+int
-+foo (int n)
-+{
-+ int i;
++#undef TEST_MSG
++#define TEST_MSG "VPMAXNMQ (FP16)"
++ clean_results ();
+
-+ for (i = 0; i < n; i++)
-+ {
-+ unsigned char uc = (unsigned char)i;
-+ a[i] = i;
-+ b[uc] = 0;
-+ }
++ VECT_VAR (vector_res, float, 16, 8)
++ = vpmaxnmq_f16 (VECT_VAR (vsrc_1, float, 16, 8),
++ VECT_VAR (vsrc_2, float, 16, 8));
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
-+ bar (a);
-+ return 0;
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_max_static, "");
+}
+
-+/* Address of array reference to b is scev. */
-+/* { dg-final { scan-tree-dump-times "use \[0-9\]\n address" 2 "ivopts" } } */
++int
++main (void)
++{
++ exec_vpminmaxnm_f16 ();
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpe.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpe.c
+@@ -7,6 +7,14 @@
+ VECT_VAR_DECL(expected_positive,uint,32,2) [] = { 0xffffffff, 0xffffffff };
+ VECT_VAR_DECL(expected_positive,uint,32,4) [] = { 0xbf000000, 0xbf000000,
+ 0xbf000000, 0xbf000000 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_positive, hfloat, 16, 4) [] = { 0x3834, 0x3834,
++ 0x3834, 0x3834 };
++VECT_VAR_DECL(expected_positive, hfloat, 16, 8) [] = { 0x2018, 0x2018,
++ 0x2018, 0x2018,
++ 0x2018, 0x2018,
++ 0x2018, 0x2018 };
++#endif
+ VECT_VAR_DECL(expected_positive,hfloat,32,2) [] = { 0x3f068000, 0x3f068000 };
+ VECT_VAR_DECL(expected_positive,hfloat,32,4) [] = { 0x3c030000, 0x3c030000,
+ 0x3c030000, 0x3c030000 };
+@@ -15,24 +23,56 @@ VECT_VAR_DECL(expected_positive,hfloat,32,4) [] = { 0x3c030000, 0x3c030000,
+ VECT_VAR_DECL(expected_negative,uint,32,2) [] = { 0x80000000, 0x80000000 };
+ VECT_VAR_DECL(expected_negative,uint,32,4) [] = { 0xee800000, 0xee800000,
+ 0xee800000, 0xee800000 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_negative, hfloat, 16, 4) [] = { 0xae64, 0xae64,
++ 0xae64, 0xae64 };
++VECT_VAR_DECL(expected_negative, hfloat, 16, 8) [] = { 0xa018, 0xa018,
++ 0xa018, 0xa018,
++ 0xa018, 0xa018,
++ 0xa018, 0xa018 };
++#endif
+ VECT_VAR_DECL(expected_negative,hfloat,32,2) [] = { 0xbdcc8000, 0xbdcc8000 };
+ VECT_VAR_DECL(expected_negative,hfloat,32,4) [] = { 0xbc030000, 0xbc030000,
+ 0xbc030000, 0xbc030000 };
+
+ /* Expected results with FP special values (NaN, infinity). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp1, hfloat, 16, 4) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++VECT_VAR_DECL(expected_fp1, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
++#endif
+ VECT_VAR_DECL(expected_fp1,hfloat,32,2) [] = { 0x7fc00000, 0x7fc00000 };
+ VECT_VAR_DECL(expected_fp1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+ /* Expected results with FP special values (zero, large value). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp2, hfloat, 16, 4) [] = { 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00 };
++VECT_VAR_DECL(expected_fp2, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
++#endif
+ VECT_VAR_DECL(expected_fp2,hfloat,32,2) [] = { 0x7f800000, 0x7f800000 };
+ VECT_VAR_DECL(expected_fp2,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+ /* Expected results with FP special values (-0, -infinity). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp3, hfloat, 16, 4) [] = { 0xfc00, 0xfc00,
++ 0xfc00, 0xfc00};
++VECT_VAR_DECL(expected_fp3, hfloat, 16, 8) [] = { 0x8000, 0x8000,
++ 0x8000, 0x8000,
++ 0x8000, 0x8000,
++ 0x8000, 0x8000 };
++#endif
+ VECT_VAR_DECL(expected_fp3,hfloat,32,2) [] = { 0xff800000, 0xff800000 };
+ VECT_VAR_DECL(expected_fp3,hfloat,32,4) [] = { 0x80000000, 0x80000000,
+ 0x80000000, 0x80000000 };
+
+ /* Expected results with FP special large negative value. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp4, hfloat, 16, 4) [] = { 0x8000, 0x8000,
++ 0x8000, 0x8000 };
++#endif
+ VECT_VAR_DECL(expected_fp4,hfloat,32,2) [] = { 0x80000000, 0x80000000 };
+
+ #define TEST_MSG "VRECPE/VRECPEQ"
+@@ -50,11 +90,19 @@ void exec_vrecpe(void)
+ /* No need for 64 bits variants. */
+ DECL_VARIABLE(vector, uint, 32, 2);
+ DECL_VARIABLE(vector, uint, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector, float, 32, 2);
+ DECL_VARIABLE(vector, float, 32, 4);
+
+ DECL_VARIABLE(vector_res, uint, 32, 2);
+ DECL_VARIABLE(vector_res, uint, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector_res, float, 32, 2);
+ DECL_VARIABLE(vector_res, float, 32, 4);
+
+@@ -62,88 +110,165 @@ void exec_vrecpe(void)
+
+ /* Choose init value arbitrarily, positive. */
+ VDUP(vector, , uint, u, 32, 2, 0x12345678);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, 1.9f);
++#endif
+ VDUP(vector, , float, f, 32, 2, 1.9f);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, q, float, f, 16, 8, 125.0f);
++#endif
+ VDUP(vector, q, uint, u, 32, 4, 0xABCDEF10);
+ VDUP(vector, q, float, f, 32, 4, 125.0f);
+
+ /* Apply the operator. */
+ TEST_VRECPE(, uint, u, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPE(, float, f, 16, 4);
++#endif
+ TEST_VRECPE(, float, f, 32, 2);
+ TEST_VRECPE(q, uint, u, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPE(q, float, f, 16, 8);
++#endif
+ TEST_VRECPE(q, float, f, 32, 4);
+
+ #define CMT " (positive input)"
+ CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected_positive, CMT);
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_positive, CMT);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_positive, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_positive, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_positive, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_positive, CMT);
+
+ /* Choose init value arbitrarily,negative. */
+ VDUP(vector, , uint, u, 32, 2, 0xFFFFFFFF);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, -10.0f);
++#endif
+ VDUP(vector, , float, f, 32, 2, -10.0f);
+ VDUP(vector, q, uint, u, 32, 4, 0x89081234);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, q, float, f, 16, 8, -125.0f);
++#endif
+ VDUP(vector, q, float, f, 32, 4, -125.0f);
+
+ /* Apply the operator. */
+ TEST_VRECPE(, uint, u, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPE(, float, f, 16, 4);
++#endif
+ TEST_VRECPE(, float, f, 32, 2);
+ TEST_VRECPE(q, uint, u, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPE(q, float, f, 16, 8);
++#endif
+ TEST_VRECPE(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " (negative input)"
+ CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected_negative, CMT);
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_negative, CMT);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_negative, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_negative, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_negative, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_negative, CMT);
+
+ /* Test FP variants with special input values (NaN, infinity). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, NAN);
++ VDUP(vector, q, float, f, 16, 8, HUGE_VALF);
++#endif
+ VDUP(vector, , float, f, 32, 2, NAN);
+ VDUP(vector, q, float, f, 32, 4, HUGE_VALF);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPE(, float, f, 16, 4);
++ TEST_VRECPE(q, float, f, 16, 8);
++#endif
+ TEST_VRECPE(, float, f, 32, 2);
+ TEST_VRECPE(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (NaN, infinity)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp1, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp1, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp1, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp1, CMT);
+
+ /* Test FP variants with special input values (zero, large value). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, 0.0f);
++ VDUP(vector, q, float, f, 16, 8, 8.97229e37f /*9.0e37f*/);
++#endif
+ VDUP(vector, , float, f, 32, 2, 0.0f);
+ VDUP(vector, q, float, f, 32, 4, 8.97229e37f /*9.0e37f*/);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPE(, float, f, 16, 4);
++ TEST_VRECPE(q, float, f, 16, 8);
++#endif
+ TEST_VRECPE(, float, f, 32, 2);
+ TEST_VRECPE(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (zero, large value)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp2, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp2, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp2, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp2, CMT);
+
+ /* Test FP variants with special input values (-0, -infinity). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, -0.0f);
++ VDUP(vector, q, float, f, 16, 8, -HUGE_VALF);
++#endif
+ VDUP(vector, , float, f, 32, 2, -0.0f);
+ VDUP(vector, q, float, f, 32, 4, -HUGE_VALF);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPE(, float, f, 16, 4);
++ TEST_VRECPE(q, float, f, 16, 8);
++#endif
+ TEST_VRECPE(, float, f, 32, 2);
+ TEST_VRECPE(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (-0, -infinity)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp3, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp3, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp3, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp3, CMT);
+
+ /* Test FP variants with special input values (large negative value). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, -9.0e37f);
++#endif
+ VDUP(vector, , float, f, 32, 2, -9.0e37f);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPE(, float, f, 16, 4);
++#endif
+ TEST_VRECPE(, float, f, 32, 2);
+
+ #undef CMT
+ #define CMT " FP special (large negative value)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp4, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp4, CMT);
+ }
+
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpeh_f16_1.c
+@@ -0,0 +1,42 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++#define A 123.4
++#define B 567.8
++#define C 34.8
++#define D 1024
++#define E 663.1
++#define F 144.0
++#define G 4.8
++#define H 77
++
++#define RECP_A 0x2028 /* 1/A. */
++#define RECP_B 0x1734 /* 1/B. */
++#define RECP_C 0x275C /* 1/C. */
++#define RECP_D 0x13FC /* 1/D. */
++#define RECP_E 0x162C /* 1/E. */
++#define RECP_F 0x1F18 /* 1/F. */
++#define RECP_G 0x32A8 /* 1/G. */
++#define RECP_H 0x22A4 /* 1/H. */
++
++float16_t input[] = { A, B, C, D, E, F, G, H };
++uint16_t expected[] = { RECP_A, RECP_B, RECP_C, RECP_D,
++ RECP_E, RECP_F, RECP_G, RECP_H };
++
++#define TEST_MSG "VRECPEH_F16"
++#define INSN_NAME vrecpeh_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecps.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecps.c
+@@ -4,22 +4,51 @@
+ #include <math.h>
+
+ /* Expected results with positive input. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xd70c, 0xd70c, 0xd70c, 0xd70c };
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xcedc, 0xcedc, 0xcedc, 0xcedc,
++ 0xcedc, 0xcedc, 0xcedc, 0xcedc };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc2e19eb7, 0xc2e19eb7 };
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1db851f, 0xc1db851f,
+ 0xc1db851f, 0xc1db851f };
+
+ /* Expected results with FP special values (NaN). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp1, hfloat, 16, 4) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++VECT_VAR_DECL(expected_fp1, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++#endif
+ VECT_VAR_DECL(expected_fp1,hfloat,32,2) [] = { 0x7fc00000, 0x7fc00000 };
+ VECT_VAR_DECL(expected_fp1,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
+ 0x7fc00000, 0x7fc00000 };
+
+ /* Expected results with FP special values (infinity, 0) and normal
+ values. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp2, hfloat, 16, 4) [] = { 0xfc00, 0xfc00,
++ 0xfc00, 0xfc00 };
++VECT_VAR_DECL(expected_fp2, hfloat, 16, 8) [] = { 0x4000, 0x4000,
++ 0x4000, 0x4000,
++ 0x4000, 0x4000,
++ 0x4000, 0x4000 };
++#endif
+ VECT_VAR_DECL(expected_fp2,hfloat,32,2) [] = { 0xff800000, 0xff800000 };
+ VECT_VAR_DECL(expected_fp2,hfloat,32,4) [] = { 0x40000000, 0x40000000,
+ 0x40000000, 0x40000000 };
+
+ /* Expected results with FP special values (infinity, 0). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp3, hfloat, 16, 4) [] = { 0x4000, 0x4000,
++ 0x4000, 0x4000 };
++VECT_VAR_DECL(expected_fp3, hfloat, 16, 8) [] = { 0x4000, 0x4000,
++ 0x4000, 0x4000,
++ 0x4000, 0x4000,
++ 0x4000, 0x4000 };
++#endif
+ VECT_VAR_DECL(expected_fp3,hfloat,32,2) [] = { 0x40000000, 0x40000000 };
+ VECT_VAR_DECL(expected_fp3,hfloat,32,4) [] = { 0x40000000, 0x40000000,
+ 0x40000000, 0x40000000 };
+@@ -38,74 +67,143 @@ void exec_vrecps(void)
+ VECT_VAR(vector_res, T1, W, N))
+
+ /* No need for integer variants. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector, float, 32, 2);
+ DECL_VARIABLE(vector, float, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector2, float, 16, 4);
++ DECL_VARIABLE(vector2, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector2, float, 32, 2);
+ DECL_VARIABLE(vector2, float, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector_res, float, 32, 2);
+ DECL_VARIABLE(vector_res, float, 32, 4);
+
+ clean_results ();
+
+ /* Choose init value arbitrarily. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, 12.9f);
++ VDUP(vector, q, float, f, 16, 8, 9.2f);
++#endif
+ VDUP(vector, , float, f, 32, 2, 12.9f);
+ VDUP(vector, q, float, f, 32, 4, 9.2f);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector2, , float, f, 16, 4, 8.9f);
++ VDUP(vector2, q, float, f, 16, 8, 3.2f);
++#endif
+ VDUP(vector2, , float, f, 32, 2, 8.9f);
+ VDUP(vector2, q, float, f, 32, 4, 3.2f);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPS(, float, f, 16, 4);
++ TEST_VRECPS(q, float, f, 16, 8);
++#endif
+ TEST_VRECPS(, float, f, 32, 2);
+ TEST_VRECPS(q, float, f, 32, 4);
+
+ #define CMT " (positive input)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, CMT);
+
+
+ /* Test FP variants with special input values (NaN). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, NAN);
++ VDUP(vector2, q, float, f, 16, 8, NAN);
++#endif
+ VDUP(vector, , float, f, 32, 2, NAN);
+ VDUP(vector2, q, float, f, 32, 4, NAN);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPS(, float, f, 16, 4);
++ TEST_VRECPS(q, float, f, 16, 8);
++#endif
+ TEST_VRECPS(, float, f, 32, 2);
+ TEST_VRECPS(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (NaN)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp1, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp1, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp1, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp1, CMT);
+
+
+ /* Test FP variants with special input values (infinity, 0). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, HUGE_VALF);
++ VDUP(vector, q, float, f, 16, 8, 0.0f);
++ VDUP(vector2, q, float, f, 16, 8, 3.2f); /* Restore a normal value. */
++#endif
+ VDUP(vector, , float, f, 32, 2, HUGE_VALF);
+ VDUP(vector, q, float, f, 32, 4, 0.0f);
+ VDUP(vector2, q, float, f, 32, 4, 3.2f); /* Restore a normal value. */
+
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPS(, float, f, 16, 4);
++ TEST_VRECPS(q, float, f, 16, 8);
++#endif
+ TEST_VRECPS(, float, f, 32, 2);
+ TEST_VRECPS(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (infinity, 0) and normal value"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp2, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp2, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp2, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp2, CMT);
+
+
+ /* Test FP variants with only special input values (infinity, 0). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, HUGE_VALF);
++ VDUP(vector, q, float, f, 16, 8, 0.0f);
++ VDUP(vector2, , float, f, 16, 4, 0.0f);
++ VDUP(vector2, q, float, f, 16, 8, HUGE_VALF);
++#endif
+ VDUP(vector, , float, f, 32, 2, HUGE_VALF);
+ VDUP(vector, q, float, f, 32, 4, 0.0f);
+ VDUP(vector2, , float, f, 32, 2, 0.0f);
+ VDUP(vector2, q, float, f, 32, 4, HUGE_VALF);
+
+
+ /* Apply the operator */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRECPS(, float, f, 16, 4);
++ TEST_VRECPS(q, float, f, 16, 8);
++#endif
+ TEST_VRECPS(, float, f, 32, 2);
+ TEST_VRECPS(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (infinity, 0)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp3, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp3, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp3, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp3, CMT);
+ }
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.dg/tree-ssa/scev-12.c
-@@ -0,0 +1,30 @@
-+/* { dg-do compile } */
-+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
-+
-+int a[128];
-+extern int b[];
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpsh_f16_1.c
+@@ -0,0 +1,50 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++#define A 12.4
++#define B -5.8
++#define C -3.8
++#define D 10
++#define E 66.1
++#define F 16.1
++#define G -4.8
++#define H -77
++
++#define I 0.7
++#define J -78
++#define K 10.23
++#define L 98
++#define M 87
++#define N -87.81
++#define O -1.1
++#define P 47.8
++
++float16_t input_1[] = { A, B, C, D, I, J, K, L };
++float16_t input_2[] = { E, F, G, H, M, N, O, P };
++uint16_t expected[] = { 0xE264 /* 2.0f - A * E. */,
++ 0x55F6 /* 2.0f - B * F. */,
++ 0xCC10 /* 2.0f - C * G. */,
++ 0x6208 /* 2.0f - D * H. */,
++ 0xD35D /* 2.0f - I * M. */,
++ 0xEEB0 /* 2.0f - J * N. */,
++ 0x4A9F /* 2.0f - K * O. */,
++ 0xEC93 /* 2.0f - L * P. */ };
++
++#define TEST_MSG "VRECPSH_F16"
++#define INSN_NAME vrecpsh_f16
++
++#define INPUT_1 input_1
++#define INPUT_2 input_2
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "binary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrecpxh_f16_1.c
+@@ -0,0 +1,32 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+int bar (int *);
++#include <arm_fp16.h>
+
-+int
-+foo (int x, int n)
-+{
-+ int i;
++/* Input values. */
+
-+ for (i = 0; i < n; i++)
++float16_t input[] = { 123.4, 567.8, 34.8, 1024, 663.1, 144.0, 4.8, 77 };
++/* Expected results are calculated by:
++ for (index = 0; index < 8; index++)
+ {
-+ unsigned char uc = (unsigned char)i;
-+ if (x)
-+ a[i] = i;
-+ b[uc] = 0;
-+ }
++ uint16_t src_cast = * (uint16_t *) &src[index];
++ * (uint16_t *) &expected[index] =
++ (src_cast & 0x8000) | (~src_cast & 0x7C00);
++ } */
++uint16_t expected[8] = { 0x2800, 0x1C00, 0x2C00, 0x1800,
++ 0x1C00, 0x2400, 0x3800, 0x2800 };
+
-+ bar (a);
-+ return 0;
-+}
++#define TEST_MSG "VRECPXH_F16"
++#define INSN_NAME vrecpxh_f16
+
-+/* Address of array reference to b is not scev. */
-+/* { dg-final { scan-tree-dump-times "use \[0-9\]\n address" 1 "ivopts" } } */
++#define INPUT input
++#define EXPECTED expected
+
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
+@@ -21,6 +21,8 @@ VECT_VAR_DECL(expected_s8_8,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf4, 0xf5, 0xf6, 0xf7 };
+ VECT_VAR_DECL(expected_s8_9,int,8,8) [] = { 0xf0, 0xff, 0xf1, 0xff,
+ 0xf2, 0xff, 0xf3, 0xff };
++VECT_VAR_DECL(expected_s8_10,int,8,8) [] = { 0x00, 0xcc, 0x80, 0xcb,
++ 0x00, 0xcb, 0x80, 0xca };
+
+ /* Expected results for vreinterpret_s16_xx. */
+ VECT_VAR_DECL(expected_s16_1,int,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
+@@ -32,6 +34,7 @@ VECT_VAR_DECL(expected_s16_6,int,16,4) [] = { 0xfff0, 0xffff, 0xfff1, 0xffff };
+ VECT_VAR_DECL(expected_s16_7,int,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
+ VECT_VAR_DECL(expected_s16_8,int,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
+ VECT_VAR_DECL(expected_s16_9,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
++VECT_VAR_DECL(expected_s16_10,int,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
+
+ /* Expected results for vreinterpret_s32_xx. */
+ VECT_VAR_DECL(expected_s32_1,int,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
+@@ -43,6 +46,7 @@ VECT_VAR_DECL(expected_s32_6,int,32,2) [] = { 0xfffffff0, 0xfffffff1 };
+ VECT_VAR_DECL(expected_s32_7,int,32,2) [] = { 0xfffffff0, 0xffffffff };
+ VECT_VAR_DECL(expected_s32_8,int,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
+ VECT_VAR_DECL(expected_s32_9,int,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
++VECT_VAR_DECL(expected_s32_10,int,32,2) [] = { 0xcb80cc00, 0xca80cb00 };
+
+ /* Expected results for vreinterpret_s64_xx. */
+ VECT_VAR_DECL(expected_s64_1,int,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
+@@ -54,6 +58,7 @@ VECT_VAR_DECL(expected_s64_6,int,64,1) [] = { 0xfffffff1fffffff0 };
+ VECT_VAR_DECL(expected_s64_7,int,64,1) [] = { 0xfffffffffffffff0 };
+ VECT_VAR_DECL(expected_s64_8,int,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
+ VECT_VAR_DECL(expected_s64_9,int,64,1) [] = { 0xfff3fff2fff1fff0 };
++VECT_VAR_DECL(expected_s64_10,int,64,1) [] = { 0xca80cb00cb80cc00 };
+
+ /* Expected results for vreinterpret_u8_xx. */
+ VECT_VAR_DECL(expected_u8_1,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+@@ -74,6 +79,8 @@ VECT_VAR_DECL(expected_u8_8,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf4, 0xf5, 0xf6, 0xf7 };
+ VECT_VAR_DECL(expected_u8_9,uint,8,8) [] = { 0xf0, 0xff, 0xf1, 0xff,
+ 0xf2, 0xff, 0xf3, 0xff };
++VECT_VAR_DECL(expected_u8_10,uint,8,8) [] = { 0x00, 0xcc, 0x80, 0xcb,
++ 0x00, 0xcb, 0x80, 0xca };
+
+ /* Expected results for vreinterpret_u16_xx. */
+ VECT_VAR_DECL(expected_u16_1,uint,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
+@@ -85,6 +92,7 @@ VECT_VAR_DECL(expected_u16_6,uint,16,4) [] = { 0xfff0, 0xffff, 0xfff1, 0xffff };
+ VECT_VAR_DECL(expected_u16_7,uint,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
+ VECT_VAR_DECL(expected_u16_8,uint,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
+ VECT_VAR_DECL(expected_u16_9,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
++VECT_VAR_DECL(expected_u16_10,uint,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
+
+ /* Expected results for vreinterpret_u32_xx. */
+ VECT_VAR_DECL(expected_u32_1,uint,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
+@@ -96,6 +104,7 @@ VECT_VAR_DECL(expected_u32_6,uint,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
+ VECT_VAR_DECL(expected_u32_7,uint,32,2) [] = { 0xfffffff0, 0xffffffff };
+ VECT_VAR_DECL(expected_u32_8,uint,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
+ VECT_VAR_DECL(expected_u32_9,uint,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
++VECT_VAR_DECL(expected_u32_10,uint,32,2) [] = { 0xcb80cc00, 0xca80cb00 };
+
+ /* Expected results for vreinterpret_u64_xx. */
+ VECT_VAR_DECL(expected_u64_1,uint,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
+@@ -107,6 +116,7 @@ VECT_VAR_DECL(expected_u64_6,uint,64,1) [] = { 0xfff3fff2fff1fff0 };
+ VECT_VAR_DECL(expected_u64_7,uint,64,1) [] = { 0xfffffff1fffffff0 };
+ VECT_VAR_DECL(expected_u64_8,uint,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
+ VECT_VAR_DECL(expected_u64_9,uint,64,1) [] = { 0xfff3fff2fff1fff0 };
++VECT_VAR_DECL(expected_u64_10,uint,64,1) [] = { 0xca80cb00cb80cc00 };
+
+ /* Expected results for vreinterpret_p8_xx. */
+ VECT_VAR_DECL(expected_p8_1,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+@@ -127,6 +137,8 @@ VECT_VAR_DECL(expected_p8_8,poly,8,8) [] = { 0xf0, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 0xff };
+ VECT_VAR_DECL(expected_p8_9,poly,8,8) [] = { 0xf0, 0xff, 0xf1, 0xff,
+ 0xf2, 0xff, 0xf3, 0xff };
++VECT_VAR_DECL(expected_p8_10,poly,8,8) [] = { 0x00, 0xcc, 0x80, 0xcb,
++ 0x00, 0xcb, 0x80, 0xca };
+
+ /* Expected results for vreinterpret_p16_xx. */
+ VECT_VAR_DECL(expected_p16_1,poly,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
+@@ -138,6 +150,7 @@ VECT_VAR_DECL(expected_p16_6,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+ VECT_VAR_DECL(expected_p16_7,poly,16,4) [] = { 0xfff0, 0xffff, 0xfff1, 0xffff };
+ VECT_VAR_DECL(expected_p16_8,poly,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
+ VECT_VAR_DECL(expected_p16_9,poly,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
++VECT_VAR_DECL(expected_p16_10,poly,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
+
+ /* Expected results for vreinterpretq_s8_xx. */
+ VECT_VAR_DECL(expected_q_s8_1,int,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
+@@ -176,6 +189,10 @@ VECT_VAR_DECL(expected_q_s8_9,int,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
+ 0xf2, 0xff, 0xf3, 0xff,
+ 0xf4, 0xff, 0xf5, 0xff,
+ 0xf6, 0xff, 0xf7, 0xff };
++VECT_VAR_DECL(expected_q_s8_10,int,8,16) [] = { 0x00, 0xcc, 0x80, 0xcb,
++ 0x00, 0xcb, 0x80, 0xca,
++ 0x00, 0xca, 0x80, 0xc9,
++ 0x00, 0xc9, 0x80, 0xc8 };
+
+ /* Expected results for vreinterpretq_s16_xx. */
+ VECT_VAR_DECL(expected_q_s16_1,int,16,8) [] = { 0xf1f0, 0xf3f2,
+@@ -214,6 +231,10 @@ VECT_VAR_DECL(expected_q_s16_9,int,16,8) [] = { 0xfff0, 0xfff1,
+ 0xfff2, 0xfff3,
+ 0xfff4, 0xfff5,
+ 0xfff6, 0xfff7 };
++VECT_VAR_DECL(expected_q_s16_10,int,16,8) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80,
++ 0xca00, 0xc980,
++ 0xc900, 0xc880 };
+
+ /* Expected results for vreinterpretq_s32_xx. */
+ VECT_VAR_DECL(expected_q_s32_1,int,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
+@@ -234,6 +255,8 @@ VECT_VAR_DECL(expected_q_s32_8,int,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
+ 0xfbfaf9f8, 0xfffefdfc };
+ VECT_VAR_DECL(expected_q_s32_9,int,32,4) [] = { 0xfff1fff0, 0xfff3fff2,
+ 0xfff5fff4, 0xfff7fff6 };
++VECT_VAR_DECL(expected_q_s32_10,int,32,4) [] = { 0xcb80cc00, 0xca80cb00,
++ 0xc980ca00, 0xc880c900 };
+
+ /* Expected results for vreinterpretq_s64_xx. */
+ VECT_VAR_DECL(expected_q_s64_1,int,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
+@@ -254,6 +277,8 @@ VECT_VAR_DECL(expected_q_s64_8,int,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
+ 0xfffefdfcfbfaf9f8 };
+ VECT_VAR_DECL(expected_q_s64_9,int,64,2) [] = { 0xfff3fff2fff1fff0,
+ 0xfff7fff6fff5fff4 };
++VECT_VAR_DECL(expected_q_s64_10,int,64,2) [] = { 0xca80cb00cb80cc00,
++ 0xc880c900c980ca00 };
+
+ /* Expected results for vreinterpretq_u8_xx. */
+ VECT_VAR_DECL(expected_q_u8_1,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+@@ -292,6 +317,10 @@ VECT_VAR_DECL(expected_q_u8_9,uint,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
+ 0xf2, 0xff, 0xf3, 0xff,
+ 0xf4, 0xff, 0xf5, 0xff,
+ 0xf6, 0xff, 0xf7, 0xff };
++VECT_VAR_DECL(expected_q_u8_10,uint,8,16) [] = { 0x00, 0xcc, 0x80, 0xcb,
++ 0x00, 0xcb, 0x80, 0xca,
++ 0x00, 0xca, 0x80, 0xc9,
++ 0x00, 0xc9, 0x80, 0xc8 };
+
+ /* Expected results for vreinterpretq_u16_xx. */
+ VECT_VAR_DECL(expected_q_u16_1,uint,16,8) [] = { 0xf1f0, 0xf3f2,
+@@ -330,6 +359,10 @@ VECT_VAR_DECL(expected_q_u16_9,uint,16,8) [] = { 0xfff0, 0xfff1,
+ 0xfff2, 0xfff3,
+ 0xfff4, 0xfff5,
+ 0xfff6, 0xfff7 };
++VECT_VAR_DECL(expected_q_u16_10,uint,16,8) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80,
++ 0xca00, 0xc980,
++ 0xc900, 0xc880 };
+
+ /* Expected results for vreinterpretq_u32_xx. */
+ VECT_VAR_DECL(expected_q_u32_1,uint,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
+@@ -350,6 +383,8 @@ VECT_VAR_DECL(expected_q_u32_8,uint,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
+ 0xfbfaf9f8, 0xfffefdfc };
+ VECT_VAR_DECL(expected_q_u32_9,uint,32,4) [] = { 0xfff1fff0, 0xfff3fff2,
+ 0xfff5fff4, 0xfff7fff6 };
++VECT_VAR_DECL(expected_q_u32_10,uint,32,4) [] = { 0xcb80cc00, 0xca80cb00,
++ 0xc980ca00, 0xc880c900 };
+
+ /* Expected results for vreinterpretq_u64_xx. */
+ VECT_VAR_DECL(expected_q_u64_1,uint,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
+@@ -370,6 +405,92 @@ VECT_VAR_DECL(expected_q_u64_8,uint,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
+ 0xfffefdfcfbfaf9f8 };
+ VECT_VAR_DECL(expected_q_u64_9,uint,64,2) [] = { 0xfff3fff2fff1fff0,
+ 0xfff7fff6fff5fff4 };
++VECT_VAR_DECL(expected_q_u64_10,uint,64,2) [] = { 0xca80cb00cb80cc00,
++ 0xc880c900c980ca00 };
+
++/* Expected results for vreinterpretq_p8_xx. */
++VECT_VAR_DECL(expected_q_p8_1,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
++ 0xf4, 0xf5, 0xf6, 0xf7,
++ 0xf8, 0xf9, 0xfa, 0xfb,
++ 0xfc, 0xfd, 0xfe, 0xff };
++VECT_VAR_DECL(expected_q_p8_2,poly,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
++ 0xf2, 0xff, 0xf3, 0xff,
++ 0xf4, 0xff, 0xf5, 0xff,
++ 0xf6, 0xff, 0xf7, 0xff };
++VECT_VAR_DECL(expected_q_p8_3,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xf1, 0xff, 0xff, 0xff,
++ 0xf2, 0xff, 0xff, 0xff,
++ 0xf3, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(expected_q_p8_4,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff,
++ 0xf1, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(expected_q_p8_5,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
++ 0xf4, 0xf5, 0xf6, 0xf7,
++ 0xf8, 0xf9, 0xfa, 0xfb,
++ 0xfc, 0xfd, 0xfe, 0xff };
++VECT_VAR_DECL(expected_q_p8_6,poly,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
++ 0xf2, 0xff, 0xf3, 0xff,
++ 0xf4, 0xff, 0xf5, 0xff,
++ 0xf6, 0xff, 0xf7, 0xff };
++VECT_VAR_DECL(expected_q_p8_7,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xf1, 0xff, 0xff, 0xff,
++ 0xf2, 0xff, 0xff, 0xff,
++ 0xf3, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(expected_q_p8_8,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff,
++ 0xf1, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(expected_q_p8_9,poly,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
++ 0xf2, 0xff, 0xf3, 0xff,
++ 0xf4, 0xff, 0xf5, 0xff,
++ 0xf6, 0xff, 0xf7, 0xff };
++VECT_VAR_DECL(expected_q_p8_10,poly,8,16) [] = { 0x00, 0xcc, 0x80, 0xcb,
++ 0x00, 0xcb, 0x80, 0xca,
++ 0x00, 0xca, 0x80, 0xc9,
++ 0x00, 0xc9, 0x80, 0xc8 };
+
---- a/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c
-+++ b/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c
-@@ -25,6 +25,7 @@ f1 (int i, ...)
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -45,6 +46,7 @@ f2 (int i, ...)
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 8 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 1 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 8 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -60,6 +62,7 @@ f3 (int i, ...)
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and 1 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and 16 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 8 GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[1-9\]\[0-9\]* GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[1-9\]\[0-9\]* GPR units" "stdarg" { target ia64-*-* } } } */
-@@ -78,6 +81,7 @@ f4 (int i, ...)
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -96,6 +100,7 @@ f5 (int i, ...)
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -116,6 +121,7 @@ f6 (int i, ...)
- /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save (3|12|24) GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 24 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 3 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 24 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -133,6 +139,7 @@ f7 (int i, ...)
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -152,6 +159,7 @@ f8 (int i, ...)
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -169,6 +177,7 @@ f9 (int i, ...)
- /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f9: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -188,6 +197,7 @@ f10 (int i, ...)
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -208,6 +218,7 @@ f11 (int i, ...)
- /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save (3|12|24) GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save 24 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save 3 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save 24 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f11: va_list escapes 0, needs to save (3|12|24) GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -228,6 +239,7 @@ f12 (int i, ...)
- /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
- /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save 24 GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save 0 GPR units and 3 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save 0 GPR units and 48 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f12: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -248,6 +260,7 @@ f13 (int i, ...)
- /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
- /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save 24 GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save 0 GPR units and 3 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save 0 GPR units and 48 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f13: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -268,6 +281,7 @@ f14 (int i, ...)
- /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save \[148\] GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
- /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save 24 GPR units and 3" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save 1 GPR units and 2 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save 8 GPR units and 32 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f14: va_list escapes 0, needs to save \[1-9]\[0-9\]* GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -291,6 +305,7 @@ f15 (int i, ...)
- /* { dg-final { scan-tree-dump "f15: va_list escapes 0, needs to save \[148\] GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
- /* { dg-final { scan-tree-dump "f15: va_list escapes 0, needs to save \[148\] GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
- /* { dg-final { scan-tree-dump "f15: va_list escapes 0, needs to save 1 GPR units and 2 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f15: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
++/* Expected results for vreinterpretq_p16_xx. */
++VECT_VAR_DECL(expected_q_p16_1,poly,16,8) [] = { 0xf1f0, 0xf3f2,
++ 0xf5f4, 0xf7f6,
++ 0xf9f8, 0xfbfa,
++ 0xfdfc, 0xfffe };
++VECT_VAR_DECL(expected_q_p16_2,poly,16,8) [] = { 0xfff0, 0xfff1,
++ 0xfff2, 0xfff3,
++ 0xfff4, 0xfff5,
++ 0xfff6, 0xfff7 };
++VECT_VAR_DECL(expected_q_p16_3,poly,16,8) [] = { 0xfff0, 0xffff,
++ 0xfff1, 0xffff,
++ 0xfff2, 0xffff,
++ 0xfff3, 0xffff };
++VECT_VAR_DECL(expected_q_p16_4,poly,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL(expected_q_p16_5,poly,16,8) [] = { 0xf1f0, 0xf3f2,
++ 0xf5f4, 0xf7f6,
++ 0xf9f8, 0xfbfa,
++ 0xfdfc, 0xfffe };
++VECT_VAR_DECL(expected_q_p16_6,poly,16,8) [] = { 0xfff0, 0xfff1,
++ 0xfff2, 0xfff3,
++ 0xfff4, 0xfff5,
++ 0xfff6, 0xfff7 };
++VECT_VAR_DECL(expected_q_p16_7,poly,16,8) [] = { 0xfff0, 0xffff,
++ 0xfff1, 0xffff,
++ 0xfff2, 0xffff,
++ 0xfff3, 0xffff };
++VECT_VAR_DECL(expected_q_p16_8,poly,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL(expected_q_p16_9,poly,16,8) [] = { 0xf1f0, 0xf3f2,
++ 0xf5f4, 0xf7f6,
++ 0xf9f8, 0xfbfa,
++ 0xfdfc, 0xfffe };
++VECT_VAR_DECL(expected_q_p16_10,poly,16,8) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80,
++ 0xca00, 0xc980,
++ 0xc900, 0xc880 };
+
+ /* Expected results for vreinterpret_f32_xx. */
+ VECT_VAR_DECL(expected_f32_1,hfloat,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
+@@ -382,6 +503,7 @@ VECT_VAR_DECL(expected_f32_7,hfloat,32,2) [] = { 0xfffffff0, 0xfffffff1 };
+ VECT_VAR_DECL(expected_f32_8,hfloat,32,2) [] = { 0xfffffff0, 0xffffffff };
+ VECT_VAR_DECL(expected_f32_9,hfloat,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
+ VECT_VAR_DECL(expected_f32_10,hfloat,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
++VECT_VAR_DECL(expected_f32_11,hfloat,32,2) [] = { 0xcb80cc00, 0xca80cb00 };
+
+ /* Expected results for vreinterpretq_f32_xx. */
+ VECT_VAR_DECL(expected_q_f32_1,hfloat,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
+@@ -404,8 +526,10 @@ VECT_VAR_DECL(expected_q_f32_9,hfloat,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
+ 0xfbfaf9f8, 0xfffefdfc };
+ VECT_VAR_DECL(expected_q_f32_10,hfloat,32,4) [] = { 0xfff1fff0, 0xfff3fff2,
+ 0xfff5fff4, 0xfff7fff6 };
++VECT_VAR_DECL(expected_q_f32_11,hfloat,32,4) [] = { 0xcb80cc00, 0xca80cb00,
++ 0xc980ca00, 0xc880c900 };
+
+-/* Expected results for vreinterpretq_xx_f32. */
++/* Expected results for vreinterpret_xx_f32. */
+ VECT_VAR_DECL(expected_xx_f32_1,int,8,8) [] = { 0x0, 0x0, 0x80, 0xc1,
+ 0x0, 0x0, 0x70, 0xc1 };
+ VECT_VAR_DECL(expected_xx_f32_2,int,16,4) [] = { 0x0, 0xc180, 0x0, 0xc170 };
+@@ -419,6 +543,7 @@ VECT_VAR_DECL(expected_xx_f32_8,uint,64,1) [] = { 0xc1700000c1800000 };
+ VECT_VAR_DECL(expected_xx_f32_9,poly,8,8) [] = { 0x0, 0x0, 0x80, 0xc1,
+ 0x0, 0x0, 0x70, 0xc1 };
+ VECT_VAR_DECL(expected_xx_f32_10,poly,16,4) [] = { 0x0, 0xc180, 0x0, 0xc170 };
++VECT_VAR_DECL(expected_xx_f32_11,hfloat,16,4) [] = { 0x0, 0xc180, 0x0, 0xc170 };
+
+ /* Expected results for vreinterpretq_xx_f32. */
+ VECT_VAR_DECL(expected_q_xx_f32_1,int,8,16) [] = { 0x0, 0x0, 0x80, 0xc1,
+@@ -447,6 +572,62 @@ VECT_VAR_DECL(expected_q_xx_f32_9,poly,8,16) [] = { 0x0, 0x0, 0x80, 0xc1,
+ 0x0, 0x0, 0x50, 0xc1 };
+ VECT_VAR_DECL(expected_q_xx_f32_10,poly,16,8) [] = { 0x0, 0xc180, 0x0, 0xc170,
+ 0x0, 0xc160, 0x0, 0xc150 };
++VECT_VAR_DECL(expected_q_xx_f32_11,hfloat,16,8) [] = { 0x0, 0xc180, 0x0, 0xc170,
++ 0x0, 0xc160, 0x0, 0xc150 };
++
++/* Expected results for vreinterpret_f16_xx. */
++VECT_VAR_DECL(expected_f16_1,hfloat,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
++VECT_VAR_DECL(expected_f16_2,hfloat,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
++VECT_VAR_DECL(expected_f16_3,hfloat,16,4) [] = { 0xfff0, 0xffff, 0xfff1, 0xffff };
++VECT_VAR_DECL(expected_f16_4,hfloat,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
++VECT_VAR_DECL(expected_f16_5,hfloat,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
++VECT_VAR_DECL(expected_f16_6,hfloat,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
++VECT_VAR_DECL(expected_f16_7,hfloat,16,4) [] = { 0xfff0, 0xffff, 0xfff1, 0xffff };
++VECT_VAR_DECL(expected_f16_8,hfloat,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
++VECT_VAR_DECL(expected_f16_9,hfloat,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
++VECT_VAR_DECL(expected_f16_10,hfloat,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
++
++/* Expected results for vreinterpretq_f16_xx. */
++VECT_VAR_DECL(expected_q_f16_1,hfloat,16,8) [] = { 0xf1f0, 0xf3f2,
++ 0xf5f4, 0xf7f6,
++ 0xf9f8, 0xfbfa,
++ 0xfdfc, 0xfffe };
++VECT_VAR_DECL(expected_q_f16_2,hfloat,16,8) [] = { 0xfff0, 0xfff1,
++ 0xfff2, 0xfff3,
++ 0xfff4, 0xfff5,
++ 0xfff6, 0xfff7 };
++VECT_VAR_DECL(expected_q_f16_3,hfloat,16,8) [] = { 0xfff0, 0xffff,
++ 0xfff1, 0xffff,
++ 0xfff2, 0xffff,
++ 0xfff3, 0xffff };
++VECT_VAR_DECL(expected_q_f16_4,hfloat,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL(expected_q_f16_5,hfloat,16,8) [] = { 0xf1f0, 0xf3f2,
++ 0xf5f4, 0xf7f6,
++ 0xf9f8, 0xfbfa,
++ 0xfdfc, 0xfffe };
++VECT_VAR_DECL(expected_q_f16_6,hfloat,16,8) [] = { 0xfff0, 0xfff1,
++ 0xfff2, 0xfff3,
++ 0xfff4, 0xfff5,
++ 0xfff6, 0xfff7 };
++VECT_VAR_DECL(expected_q_f16_7,hfloat,16,8) [] = { 0xfff0, 0xffff,
++ 0xfff1, 0xffff,
++ 0xfff2, 0xffff,
++ 0xfff3, 0xffff };
++VECT_VAR_DECL(expected_q_f16_8,hfloat,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL(expected_q_f16_9,hfloat,16,8) [] = { 0xf1f0, 0xf3f2,
++ 0xf5f4, 0xf7f6,
++ 0xf9f8, 0xfbfa,
++ 0xfdfc, 0xfffe };
++VECT_VAR_DECL(expected_q_f16_10,hfloat,16,8) [] = { 0xfff0, 0xfff1,
++ 0xfff2, 0xfff3,
++ 0xfff4, 0xfff5,
++ 0xfff6, 0xfff7 };
+
+ #define TEST_MSG "VREINTERPRET/VREINTERPRETQ"
+
+@@ -484,6 +665,10 @@ void exec_vreinterpret (void)
+
+ /* Initialize input "vector" from "buffer". */
+ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector, buffer, , float, f, 32, 2);
+ VLOAD(vector, buffer, q, float, f, 32, 4);
+
+@@ -497,6 +682,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(, int, s, 8, 8, uint, u, 64, 1, expected_s8_7);
+ TEST_VREINTERPRET(, int, s, 8, 8, poly, p, 8, 8, expected_s8_8);
+ TEST_VREINTERPRET(, int, s, 8, 8, poly, p, 16, 4, expected_s8_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(, int, s, 8, 8, float, f, 16, 4, expected_s8_10);
++#endif
+
+ /* vreinterpret_s16_xx. */
+ TEST_VREINTERPRET(, int, s, 16, 4, int, s, 8, 8, expected_s16_1);
+@@ -508,6 +696,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(, int, s, 16, 4, uint, u, 64, 1, expected_s16_7);
+ TEST_VREINTERPRET(, int, s, 16, 4, poly, p, 8, 8, expected_s16_8);
+ TEST_VREINTERPRET(, int, s, 16, 4, poly, p, 16, 4, expected_s16_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(, int, s, 16, 4, float, f, 16, 4, expected_s16_10);
++#endif
+
+ /* vreinterpret_s32_xx. */
+ TEST_VREINTERPRET(, int, s, 32, 2, int, s, 8, 8, expected_s32_1);
+@@ -519,6 +710,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(, int, s, 32, 2, uint, u, 64, 1, expected_s32_7);
+ TEST_VREINTERPRET(, int, s, 32, 2, poly, p, 8, 8, expected_s32_8);
+ TEST_VREINTERPRET(, int, s, 32, 2, poly, p, 16, 4, expected_s32_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(, int, s, 32, 2, float, f, 16, 4, expected_s32_10);
++#endif
+
+ /* vreinterpret_s64_xx. */
+ TEST_VREINTERPRET(, int, s, 64, 1, int, s, 8, 8, expected_s64_1);
+@@ -530,6 +724,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(, int, s, 64, 1, uint, u, 64, 1, expected_s64_7);
+ TEST_VREINTERPRET(, int, s, 64, 1, poly, p, 8, 8, expected_s64_8);
+ TEST_VREINTERPRET(, int, s, 64, 1, poly, p, 16, 4, expected_s64_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(, int, s, 64, 1, float, f, 16, 4, expected_s64_10);
++#endif
+
+ /* vreinterpret_u8_xx. */
+ TEST_VREINTERPRET(, uint, u, 8, 8, int, s, 8, 8, expected_u8_1);
+@@ -541,6 +738,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(, uint, u, 8, 8, uint, u, 64, 1, expected_u8_7);
+ TEST_VREINTERPRET(, uint, u, 8, 8, poly, p, 8, 8, expected_u8_8);
+ TEST_VREINTERPRET(, uint, u, 8, 8, poly, p, 16, 4, expected_u8_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(, uint, u, 8, 8, float, f, 16, 4, expected_u8_10);
++#endif
+
+ /* vreinterpret_u16_xx. */
+ TEST_VREINTERPRET(, uint, u, 16, 4, int, s, 8, 8, expected_u16_1);
+@@ -552,6 +752,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(, uint, u, 16, 4, uint, u, 64, 1, expected_u16_7);
+ TEST_VREINTERPRET(, uint, u, 16, 4, poly, p, 8, 8, expected_u16_8);
+ TEST_VREINTERPRET(, uint, u, 16, 4, poly, p, 16, 4, expected_u16_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(, uint, u, 16, 4, float, f, 16, 4, expected_u16_10);
++#endif
+
+ /* vreinterpret_u32_xx. */
+ TEST_VREINTERPRET(, uint, u, 32, 2, int, s, 8, 8, expected_u32_1);
+@@ -563,6 +766,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(, uint, u, 32, 2, uint, u, 64, 1, expected_u32_7);
+ TEST_VREINTERPRET(, uint, u, 32, 2, poly, p, 8, 8, expected_u32_8);
+ TEST_VREINTERPRET(, uint, u, 32, 2, poly, p, 16, 4, expected_u32_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(, uint, u, 32, 2, float, f, 16, 4, expected_u32_10);
++#endif
+
+ /* vreinterpret_u64_xx. */
+ TEST_VREINTERPRET(, uint, u, 64, 1, int, s, 8, 8, expected_u64_1);
+@@ -574,6 +780,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(, uint, u, 64, 1, uint, u, 32, 2, expected_u64_7);
+ TEST_VREINTERPRET(, uint, u, 64, 1, poly, p, 8, 8, expected_u64_8);
+ TEST_VREINTERPRET(, uint, u, 64, 1, poly, p, 16, 4, expected_u64_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(, uint, u, 64, 1, float, f, 16, 4, expected_u64_10);
++#endif
+
+ /* vreinterpret_p8_xx. */
+ TEST_VREINTERPRET_POLY(, poly, p, 8, 8, int, s, 8, 8, expected_p8_1);
+@@ -585,6 +794,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET_POLY(, poly, p, 8, 8, uint, u, 32, 2, expected_p8_7);
+ TEST_VREINTERPRET_POLY(, poly, p, 8, 8, uint, u, 64, 1, expected_p8_8);
+ TEST_VREINTERPRET_POLY(, poly, p, 8, 8, poly, p, 16, 4, expected_p8_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_POLY(, poly, p, 8, 8, float, f, 16, 4, expected_p8_10);
++#endif
+
+ /* vreinterpret_p16_xx. */
+ TEST_VREINTERPRET_POLY(, poly, p, 16, 4, int, s, 8, 8, expected_p16_1);
+@@ -596,6 +808,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET_POLY(, poly, p, 16, 4, uint, u, 32, 2, expected_p16_7);
+ TEST_VREINTERPRET_POLY(, poly, p, 16, 4, uint, u, 64, 1, expected_p16_8);
+ TEST_VREINTERPRET_POLY(, poly, p, 16, 4, poly, p, 8, 8, expected_p16_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_POLY(, poly, p, 16, 4, float, f, 16, 4, expected_p16_10);
++#endif
+
+ /* vreinterpretq_s8_xx. */
+ TEST_VREINTERPRET(q, int, s, 8, 16, int, s, 16, 8, expected_q_s8_1);
+@@ -607,6 +822,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(q, int, s, 8, 16, uint, u, 64, 2, expected_q_s8_7);
+ TEST_VREINTERPRET(q, int, s, 8, 16, poly, p, 8, 16, expected_q_s8_8);
+ TEST_VREINTERPRET(q, int, s, 8, 16, poly, p, 16, 8, expected_q_s8_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(q, int, s, 8, 16, float, f, 16, 8, expected_q_s8_10);
++#endif
+
+ /* vreinterpretq_s16_xx. */
+ TEST_VREINTERPRET(q, int, s, 16, 8, int, s, 8, 16, expected_q_s16_1);
+@@ -618,6 +836,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(q, int, s, 16, 8, uint, u, 64, 2, expected_q_s16_7);
+ TEST_VREINTERPRET(q, int, s, 16, 8, poly, p, 8, 16, expected_q_s16_8);
+ TEST_VREINTERPRET(q, int, s, 16, 8, poly, p, 16, 8, expected_q_s16_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(q, int, s, 16, 8, float, f, 16, 8, expected_q_s16_10);
++#endif
+
+ /* vreinterpretq_s32_xx. */
+ TEST_VREINTERPRET(q, int, s, 32, 4, int, s, 8, 16, expected_q_s32_1);
+@@ -629,6 +850,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(q, int, s, 32, 4, uint, u, 64, 2, expected_q_s32_7);
+ TEST_VREINTERPRET(q, int, s, 32, 4, poly, p, 8, 16, expected_q_s32_8);
+ TEST_VREINTERPRET(q, int, s, 32, 4, poly, p, 16, 8, expected_q_s32_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(q, int, s, 32, 4, float, f, 16, 8, expected_q_s32_10);
++#endif
+
+ /* vreinterpretq_s64_xx. */
+ TEST_VREINTERPRET(q, int, s, 64, 2, int, s, 8, 16, expected_q_s64_1);
+@@ -640,6 +864,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(q, int, s, 64, 2, uint, u, 64, 2, expected_q_s64_7);
+ TEST_VREINTERPRET(q, int, s, 64, 2, poly, p, 8, 16, expected_q_s64_8);
+ TEST_VREINTERPRET(q, int, s, 64, 2, poly, p, 16, 8, expected_q_s64_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(q, int, s, 64, 2, float, f, 16, 8, expected_q_s64_10);
++#endif
+
+ /* vreinterpretq_u8_xx. */
+ TEST_VREINTERPRET(q, uint, u, 8, 16, int, s, 8, 16, expected_q_u8_1);
+@@ -651,6 +878,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(q, uint, u, 8, 16, uint, u, 64, 2, expected_q_u8_7);
+ TEST_VREINTERPRET(q, uint, u, 8, 16, poly, p, 8, 16, expected_q_u8_8);
+ TEST_VREINTERPRET(q, uint, u, 8, 16, poly, p, 16, 8, expected_q_u8_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(q, uint, u, 8, 16, float, f, 16, 8, expected_q_u8_10);
++#endif
+
+ /* vreinterpretq_u16_xx. */
+ TEST_VREINTERPRET(q, uint, u, 16, 8, int, s, 8, 16, expected_q_u16_1);
+@@ -662,6 +892,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(q, uint, u, 16, 8, uint, u, 64, 2, expected_q_u16_7);
+ TEST_VREINTERPRET(q, uint, u, 16, 8, poly, p, 8, 16, expected_q_u16_8);
+ TEST_VREINTERPRET(q, uint, u, 16, 8, poly, p, 16, 8, expected_q_u16_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(q, uint, u, 16, 8, float, f, 16, 8, expected_q_u16_10);
++#endif
- /* We may be able to improve upon this after fixing PR66010/PR66013. */
- /* { dg-final { scan-tree-dump "f15: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
---- a/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-3.c
-+++ b/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-3.c
-@@ -24,6 +24,7 @@ f1 (int i, ...)
- /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -39,6 +40,7 @@ f2 (int i, ...)
- /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -57,6 +59,7 @@ f3 (int i, ...)
- /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -73,6 +76,7 @@ f4 (int i, ...)
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -89,6 +93,7 @@ f5 (int i, ...)
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -107,6 +112,7 @@ f6 (int i, ...)
- /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -123,6 +129,7 @@ f7 (int i, ...)
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -139,6 +146,7 @@ f8 (int i, ...)
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f8: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -155,6 +163,7 @@ f10 (int i, ...)
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f10: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -171,6 +180,7 @@ f11 (int i, ...)
- /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f11: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -187,6 +197,7 @@ f12 (int i, ...)
- /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f12: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
---- a/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-4.c
-+++ b/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-4.c
-@@ -27,6 +27,7 @@ f1 (int i, ...)
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -44,6 +45,7 @@ f2 (int i, ...)
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 0 GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save all GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 0 GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save 0 GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes \[01\], needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -67,6 +69,7 @@ f3 (int i, ...)
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[148\] GPR units and 0 FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 8 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 1 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 8 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
-@@ -88,6 +91,7 @@ f4 (int i, ...)
- /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 0 GPR units and \[1-9\]\[0-9\]* FPR units" "stdarg" { target { powerpc*-*-linux* && { powerpc_fprs && ilp32 } } } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 8 GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 0 GPR units and 1 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 0 GPR units and 16 FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save \[148\] GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
---- a/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-5.c
-+++ b/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-5.c
-@@ -25,6 +25,7 @@ f1 (int i, ...)
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f1: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* vreinterpretq_u32_xx. */
+ TEST_VREINTERPRET(q, uint, u, 32, 4, int, s, 8, 16, expected_q_u32_1);
+@@ -673,6 +906,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(q, uint, u, 32, 4, uint, u, 64, 2, expected_q_u32_7);
+ TEST_VREINTERPRET(q, uint, u, 32, 4, poly, p, 8, 16, expected_q_u32_8);
+ TEST_VREINTERPRET(q, uint, u, 32, 4, poly, p, 16, 8, expected_q_u32_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(q, uint, u, 32, 4, float, f, 16, 8, expected_q_u32_10);
++#endif
- void
- f2 (int i, ...)
-@@ -38,6 +39,7 @@ f2 (int i, ...)
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save all GPR units and all FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save all GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f2: va_list escapes 0, needs to save all GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* vreinterpretq_u64_xx. */
+ TEST_VREINTERPRET(q, uint, u, 64, 2, int, s, 8, 16, expected_q_u64_1);
+@@ -684,6 +920,37 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(q, uint, u, 64, 2, uint, u, 32, 4, expected_q_u64_7);
+ TEST_VREINTERPRET(q, uint, u, 64, 2, poly, p, 8, 16, expected_q_u64_8);
+ TEST_VREINTERPRET(q, uint, u, 64, 2, poly, p, 16, 8, expected_q_u64_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(q, uint, u, 64, 2, float, f, 16, 8, expected_q_u64_10);
++#endif
++
++ /* vreinterpretq_p8_xx. */
++ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, int, s, 8, 16, expected_q_p8_1);
++ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, int, s, 16, 8, expected_q_p8_2);
++ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, int, s, 32, 4, expected_q_p8_3);
++ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, int, s, 64, 2, expected_q_p8_4);
++ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, uint, u, 8, 16, expected_q_p8_5);
++ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, uint, u, 16, 8, expected_q_p8_6);
++ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, uint, u, 32, 4, expected_q_p8_7);
++ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, uint, u, 64, 2, expected_q_p8_8);
++ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, poly, p, 16, 8, expected_q_p8_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, float, f, 16, 8, expected_q_p8_10);
++#endif
++
++ /* vreinterpretq_p16_xx. */
++ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, int, s, 8, 16, expected_q_p16_1);
++ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, int, s, 16, 8, expected_q_p16_2);
++ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, int, s, 32, 4, expected_q_p16_3);
++ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, int, s, 64, 2, expected_q_p16_4);
++ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, uint, u, 8, 16, expected_q_p16_5);
++ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, uint, u, 16, 8, expected_q_p16_6);
++ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, uint, u, 32, 4, expected_q_p16_7);
++ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, uint, u, 64, 2, expected_q_p16_8);
++ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, poly, p, 8, 16, expected_q_p16_9);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, float, f, 16, 8, expected_q_p16_10);
++#endif
- /* Here va_arg can be executed at most as many times as va_start. */
- void
-@@ -56,6 +58,7 @@ f3 (int i, ...)
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 0 GPR units and 0 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 32 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 1 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f3: va_list escapes 0, needs to save 8 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* vreinterpret_f32_xx. */
+ TEST_VREINTERPRET_FP(, float, f, 32, 2, int, s, 8, 8, expected_f32_1);
+@@ -696,6 +963,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET_FP(, float, f, 32, 2, uint, u, 64, 1, expected_f32_8);
+ TEST_VREINTERPRET_FP(, float, f, 32, 2, poly, p, 8, 8, expected_f32_9);
+ TEST_VREINTERPRET_FP(, float, f, 32, 2, poly, p, 16, 4, expected_f32_10);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_FP(, float, f, 32, 2, float, f, 16, 4, expected_f32_11);
++#endif
+
+ /* vreinterpretq_f32_xx. */
+ TEST_VREINTERPRET_FP(q, float, f, 32, 4, int, s, 8, 16, expected_q_f32_1);
+@@ -708,6 +978,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET_FP(q, float, f, 32, 4, uint, u, 64, 2, expected_q_f32_8);
+ TEST_VREINTERPRET_FP(q, float, f, 32, 4, poly, p, 8, 16, expected_q_f32_9);
+ TEST_VREINTERPRET_FP(q, float, f, 32, 4, poly, p, 16, 8, expected_q_f32_10);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_FP(q, float, f, 32, 4, float, f, 16, 8, expected_q_f32_11);
++#endif
+
+ /* vreinterpret_xx_f32. */
+ TEST_VREINTERPRET(, int, s, 8, 8, float, f, 32, 2, expected_xx_f32_1);
+@@ -720,6 +993,9 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(, uint, u, 64, 1, float, f, 32, 2, expected_xx_f32_8);
+ TEST_VREINTERPRET_POLY(, poly, p, 8, 8, float, f, 32, 2, expected_xx_f32_9);
+ TEST_VREINTERPRET_POLY(, poly, p, 16, 4, float, f, 32, 2, expected_xx_f32_10);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, float, f, 32, 2, expected_xx_f32_11);
++#endif
+
+ /* vreinterpretq_xx_f32. */
+ TEST_VREINTERPRET(q, int, s, 8, 16, float, f, 32, 4, expected_q_xx_f32_1);
+@@ -732,6 +1008,33 @@ void exec_vreinterpret (void)
+ TEST_VREINTERPRET(q, uint, u, 64, 2, float, f, 32, 4, expected_q_xx_f32_8);
+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, float, f, 32, 4, expected_q_xx_f32_9);
+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, float, f, 32, 4, expected_q_xx_f32_10);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, float, f, 32, 4, expected_q_xx_f32_11);
++
++ /* vreinterpret_f16_xx. */
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, int, s, 8, 8, expected_f16_1);
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, int, s, 16, 4, expected_f16_2);
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, int, s, 32, 2, expected_f16_3);
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, int, s, 64, 1, expected_f16_4);
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, uint, u, 8, 8, expected_f16_5);
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, uint, u, 16, 4, expected_f16_6);
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, uint, u, 32, 2, expected_f16_7);
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, uint, u, 64, 1, expected_f16_8);
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, poly, p, 8, 8, expected_f16_9);
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, poly, p, 16, 4, expected_f16_10);
++
++ /* vreinterpretq_f16_xx. */
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, int, s, 8, 16, expected_q_f16_1);
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, int, s, 16, 8, expected_q_f16_2);
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, int, s, 32, 4, expected_q_f16_3);
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, int, s, 64, 2, expected_q_f16_4);
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, uint, u, 8, 16, expected_q_f16_5);
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, uint, u, 16, 8, expected_q_f16_6);
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, uint, u, 32, 4, expected_q_f16_7);
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, uint, u, 64, 2, expected_q_f16_8);
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, poly, p, 8, 16, expected_q_f16_9);
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, poly, p, 16, 8, expected_q_f16_10);
++#endif
+ }
+
+ int main (void)
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c
+@@ -0,0 +1,166 @@
++/* This file contains tests for the vreinterpret *p128 intrinsics. */
++
++/* { dg-require-effective-target arm_crypto_ok } */
++/* { dg-add-options arm_crypto } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++/* Expected results: vreinterpretq_p128_*. */
++VECT_VAR_DECL(vreint_expected_q_p128_s8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
++ 0xfffefdfcfbfaf9f8 };
++VECT_VAR_DECL(vreint_expected_q_p128_s16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
++ 0xfff7fff6fff5fff4 };
++VECT_VAR_DECL(vreint_expected_q_p128_s32,poly,64,2) [] = { 0xfffffff1fffffff0,
++ 0xfffffff3fffffff2 };
++VECT_VAR_DECL(vreint_expected_q_p128_s64,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vreint_expected_q_p128_u8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
++ 0xfffefdfcfbfaf9f8 };
++VECT_VAR_DECL(vreint_expected_q_p128_u16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
++ 0xfff7fff6fff5fff4 };
++VECT_VAR_DECL(vreint_expected_q_p128_u32,poly,64,2) [] = { 0xfffffff1fffffff0,
++ 0xfffffff3fffffff2 };
++VECT_VAR_DECL(vreint_expected_q_p128_u64,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vreint_expected_q_p128_p8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
++ 0xfffefdfcfbfaf9f8 };
++VECT_VAR_DECL(vreint_expected_q_p128_p16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
++ 0xfff7fff6fff5fff4 };
++VECT_VAR_DECL(vreint_expected_q_p128_f32,poly,64,2) [] = { 0xc1700000c1800000,
++ 0xc1500000c1600000 };
++VECT_VAR_DECL(vreint_expected_q_p128_f16,poly,64,2) [] = { 0xca80cb00cb80cc00,
++ 0xc880c900c980ca00 };
++
++/* Expected results: vreinterpretq_*_p128. */
++VECT_VAR_DECL(vreint_expected_q_s8_p128,int,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff,
++ 0xf1, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(vreint_expected_q_s16_p128,int,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL(vreint_expected_q_s32_p128,int,32,4) [] = { 0xfffffff0, 0xffffffff,
++ 0xfffffff1, 0xffffffff };
++VECT_VAR_DECL(vreint_expected_q_s64_p128,int,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vreint_expected_q_u8_p128,uint,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff,
++ 0xf1, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(vreint_expected_q_u16_p128,uint,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL(vreint_expected_q_u32_p128,uint,32,4) [] = { 0xfffffff0, 0xffffffff,
++ 0xfffffff1, 0xffffffff };
++VECT_VAR_DECL(vreint_expected_q_u64_p128,uint,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vreint_expected_q_p8_p128,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff,
++ 0xf1, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(vreint_expected_q_p16_p128,poly,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL(vreint_expected_q_p64_p128,uint,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vreint_expected_q_f32_p128,hfloat,32,4) [] = { 0xfffffff0, 0xffffffff,
++ 0xfffffff1, 0xffffffff };
++VECT_VAR_DECL(vreint_expected_q_f16_p128,hfloat,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++
++int main (void)
++{
++ DECL_VARIABLE_128BITS_VARIANTS(vreint_vector);
++ DECL_VARIABLE(vreint_vector, poly, 64, 2);
++ DECL_VARIABLE_128BITS_VARIANTS(vreint_vector_res);
++ DECL_VARIABLE(vreint_vector_res, poly, 64, 2);
++
++ clean_results ();
++
++ TEST_MACRO_128BITS_VARIANTS_2_5(VLOAD, vreint_vector, buffer);
++ VLOAD(vreint_vector, buffer, q, poly, p, 64, 2);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ VLOAD(vreint_vector, buffer, q, float, f, 16, 8);
++#endif
++ VLOAD(vreint_vector, buffer, q, float, f, 32, 4);
++
++ /* vreinterpretq_p128_* tests. */
++#undef TEST_MSG
++#define TEST_MSG "VREINTERPRETQ_P128_*"
++
++ /* Since there is no way to store a poly128_t value, convert to
++ poly64x2_t before storing. This means that we are not able to
++ test vreinterpretq_p128* alone, and that errors in
++ vreinterpretq_p64_p128 could compensate for errors in
++ vreinterpretq_p128*. */
++#define TEST_VREINTERPRET128(Q, T1, T2, W, N, TS1, TS2, WS, NS, EXPECTED) \
++ VECT_VAR(vreint_vector_res, poly, 64, 2) = vreinterpretq_p64_p128( \
++ vreinterpret##Q##_##T2##W##_##TS2##WS(VECT_VAR(vreint_vector, TS1, WS, NS))); \
++ vst1##Q##_##T2##64(VECT_VAR(result, poly, 64, 2), \
++ VECT_VAR(vreint_vector_res, poly, 64, 2)); \
++ CHECK(TEST_MSG, T1, 64, 2, PRIx##64, EXPECTED, "");
++
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, int, s, 8, 16, vreint_expected_q_p128_s8);
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, int, s, 16, 8, vreint_expected_q_p128_s16);
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, int, s, 32, 4, vreint_expected_q_p128_s32);
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, int, s, 64, 2, vreint_expected_q_p128_s64);
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, uint, u, 8, 16, vreint_expected_q_p128_u8);
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, uint, u, 16, 8, vreint_expected_q_p128_u16);
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, uint, u, 32, 4, vreint_expected_q_p128_u32);
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, uint, u, 64, 2, vreint_expected_q_p128_u64);
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, poly, p, 8, 16, vreint_expected_q_p128_p8);
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, poly, p, 16, 8, vreint_expected_q_p128_p16);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, float, f, 16, 8, vreint_expected_q_p128_f16);
++#endif
++ TEST_VREINTERPRET128(q, poly, p, 128, 1, float, f, 32, 4, vreint_expected_q_p128_f32);
++
++ /* vreinterpretq_*_p128 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VREINTERPRETQ_*_P128"
++
++ /* Since there is no way to load a poly128_t value, load a
++ poly64x2_t and convert it to poly128_t. This means that we are
++ not able to test vreinterpretq_*_p128 alone, and that errors in
++ vreinterpretq_p128_p64 could compensate for errors in
++ vreinterpretq_*_p128*. */
++#define TEST_VREINTERPRET_FROM_P128(Q, T1, T2, W, N, TS1, TS2, WS, NS, EXPECTED) \
++ VECT_VAR(vreint_vector_res, T1, W, N) = \
++ vreinterpret##Q##_##T2##W##_##TS2##WS( \
++ vreinterpretq_p128_p64(VECT_VAR(vreint_vector, TS1, 64, 2))); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \
++ VECT_VAR(vreint_vector_res, T1, W, N)); \
++ CHECK(TEST_MSG, T1, W, N, PRIx##W, EXPECTED, "");
++
++#define TEST_VREINTERPRET_FP_FROM_P128(Q, T1, T2, W, N, TS1, TS2, WS, NS, EXPECTED) \
++ VECT_VAR(vreint_vector_res, T1, W, N) = \
++ vreinterpret##Q##_##T2##W##_##TS2##WS( \
++ vreinterpretq_p128_p64(VECT_VAR(vreint_vector, TS1, 64, 2))); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \
++ VECT_VAR(vreint_vector_res, T1, W, N)); \
++ CHECK_FP(TEST_MSG, T1, W, N, PRIx##W, EXPECTED, "");
++
++ TEST_VREINTERPRET_FROM_P128(q, int, s, 8, 16, poly, p, 128, 1, vreint_expected_q_s8_p128);
++ TEST_VREINTERPRET_FROM_P128(q, int, s, 16, 8, poly, p, 128, 1, vreint_expected_q_s16_p128);
++ TEST_VREINTERPRET_FROM_P128(q, int, s, 32, 4, poly, p, 128, 1, vreint_expected_q_s32_p128);
++ TEST_VREINTERPRET_FROM_P128(q, int, s, 64, 2, poly, p, 128, 1, vreint_expected_q_s64_p128);
++ TEST_VREINTERPRET_FROM_P128(q, uint, u, 8, 16, poly, p, 128, 1, vreint_expected_q_u8_p128);
++ TEST_VREINTERPRET_FROM_P128(q, uint, u, 16, 8, poly, p, 128, 1, vreint_expected_q_u16_p128);
++ TEST_VREINTERPRET_FROM_P128(q, uint, u, 32, 4, poly, p, 128, 1, vreint_expected_q_u32_p128);
++ TEST_VREINTERPRET_FROM_P128(q, uint, u, 64, 2, poly, p, 128, 1, vreint_expected_q_u64_p128);
++ TEST_VREINTERPRET_FROM_P128(q, poly, p, 8, 16, poly, p, 128, 1, vreint_expected_q_p8_p128);
++ TEST_VREINTERPRET_FROM_P128(q, poly, p, 16, 8, poly, p, 128, 1, vreint_expected_q_p16_p128);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_FP_FROM_P128(q, float, f, 16, 8, poly, p, 128, 1, vreint_expected_q_f16_p128);
++#endif
++ TEST_VREINTERPRET_FP_FROM_P128(q, float, f, 32, 4, poly, p, 128, 1, vreint_expected_q_f32_p128);
++
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c
+@@ -0,0 +1,212 @@
++/* This file contains tests for the vreinterpret *p64 intrinsics. */
++
++/* { dg-require-effective-target arm_crypto_ok } */
++/* { dg-add-options arm_crypto } */
++
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++/* Expected results: vreinterpret_p64_*. */
++VECT_VAR_DECL(vreint_expected_p64_s8,poly,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
++VECT_VAR_DECL(vreint_expected_p64_s16,poly,64,1) [] = { 0xfff3fff2fff1fff0 };
++VECT_VAR_DECL(vreint_expected_p64_s32,poly,64,1) [] = { 0xfffffff1fffffff0 };
++VECT_VAR_DECL(vreint_expected_p64_s64,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vreint_expected_p64_u8,poly,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
++VECT_VAR_DECL(vreint_expected_p64_u16,poly,64,1) [] = { 0xfff3fff2fff1fff0 };
++VECT_VAR_DECL(vreint_expected_p64_u32,poly,64,1) [] = { 0xfffffff1fffffff0 };
++VECT_VAR_DECL(vreint_expected_p64_u64,poly,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vreint_expected_p64_p8,poly,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
++VECT_VAR_DECL(vreint_expected_p64_p16,poly,64,1) [] = { 0xfff3fff2fff1fff0 };
++VECT_VAR_DECL(vreint_expected_p64_f32,poly,64,1) [] = { 0xc1700000c1800000 };
++VECT_VAR_DECL(vreint_expected_p64_f16,poly,64,1) [] = { 0xca80cb00cb80cc00 };
++
++/* Expected results: vreinterpretq_p64_*. */
++VECT_VAR_DECL(vreint_expected_q_p64_s8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
++ 0xfffefdfcfbfaf9f8 };
++VECT_VAR_DECL(vreint_expected_q_p64_s16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
++ 0xfff7fff6fff5fff4 };
++VECT_VAR_DECL(vreint_expected_q_p64_s32,poly,64,2) [] = { 0xfffffff1fffffff0,
++ 0xfffffff3fffffff2 };
++VECT_VAR_DECL(vreint_expected_q_p64_s64,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vreint_expected_q_p64_u8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
++ 0xfffefdfcfbfaf9f8 };
++VECT_VAR_DECL(vreint_expected_q_p64_u16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
++ 0xfff7fff6fff5fff4 };
++VECT_VAR_DECL(vreint_expected_q_p64_u32,poly,64,2) [] = { 0xfffffff1fffffff0,
++ 0xfffffff3fffffff2 };
++VECT_VAR_DECL(vreint_expected_q_p64_u64,poly,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vreint_expected_q_p64_p8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
++ 0xfffefdfcfbfaf9f8 };
++VECT_VAR_DECL(vreint_expected_q_p64_p16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
++ 0xfff7fff6fff5fff4 };
++VECT_VAR_DECL(vreint_expected_q_p64_f32,poly,64,2) [] = { 0xc1700000c1800000,
++ 0xc1500000c1600000 };
++VECT_VAR_DECL(vreint_expected_q_p64_f16,poly,64,2) [] = { 0xca80cb00cb80cc00,
++ 0xc880c900c980ca00 };
++
++/* Expected results: vreinterpret_*_p64. */
++VECT_VAR_DECL(vreint_expected_s8_p64,int,8,8) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(vreint_expected_s16_p64,int,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
++VECT_VAR_DECL(vreint_expected_s32_p64,int,32,2) [] = { 0xfffffff0, 0xffffffff };
++VECT_VAR_DECL(vreint_expected_s64_p64,int,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vreint_expected_u8_p64,uint,8,8) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(vreint_expected_u16_p64,uint,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
++VECT_VAR_DECL(vreint_expected_u32_p64,uint,32,2) [] = { 0xfffffff0, 0xffffffff };
++VECT_VAR_DECL(vreint_expected_u64_p64,uint,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(vreint_expected_p8_p64,poly,8,8) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(vreint_expected_p16_p64,poly,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
++VECT_VAR_DECL(vreint_expected_f32_p64,hfloat,32,2) [] = { 0xfffffff0, 0xffffffff };
++VECT_VAR_DECL(vreint_expected_f16_p64,hfloat,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
++
++/* Expected results: vreinterpretq_*_p64. */
++VECT_VAR_DECL(vreint_expected_q_s8_p64,int,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff,
++ 0xf1, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(vreint_expected_q_s16_p64,int,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL(vreint_expected_q_s32_p64,int,32,4) [] = { 0xfffffff0, 0xffffffff,
++ 0xfffffff1, 0xffffffff };
++VECT_VAR_DECL(vreint_expected_q_s64_p64,int,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vreint_expected_q_u8_p64,uint,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff,
++ 0xf1, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(vreint_expected_q_u16_p64,uint,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL(vreint_expected_q_u32_p64,uint,32,4) [] = { 0xfffffff0, 0xffffffff,
++ 0xfffffff1, 0xffffffff };
++VECT_VAR_DECL(vreint_expected_q_u64_p64,uint,64,2) [] = { 0xfffffffffffffff0,
++ 0xfffffffffffffff1 };
++VECT_VAR_DECL(vreint_expected_q_p8_p64,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff,
++ 0xf1, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(vreint_expected_q_p16_p64,poly,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++VECT_VAR_DECL(vreint_expected_q_f32_p64,hfloat,32,4) [] = { 0xfffffff0, 0xffffffff,
++ 0xfffffff1, 0xffffffff };
++VECT_VAR_DECL(vreint_expected_q_f16_p64,hfloat,16,8) [] = { 0xfff0, 0xffff,
++ 0xffff, 0xffff,
++ 0xfff1, 0xffff,
++ 0xffff, 0xffff };
++
++int main (void)
++{
++#define TEST_VREINTERPRET(Q, T1, T2, W, N, TS1, TS2, WS, NS, EXPECTED) \
++ VECT_VAR(vreint_vector_res, T1, W, N) = \
++ vreinterpret##Q##_##T2##W##_##TS2##WS(VECT_VAR(vreint_vector, TS1, WS, NS)); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \
++ VECT_VAR(vreint_vector_res, T1, W, N)); \
++ CHECK(TEST_MSG, T1, W, N, PRIx##W, EXPECTED, "");
++
++#define TEST_VREINTERPRET_FP(Q, T1, T2, W, N, TS1, TS2, WS, NS, EXPECTED) \
++ VECT_VAR(vreint_vector_res, T1, W, N) = \
++ vreinterpret##Q##_##T2##W##_##TS2##WS(VECT_VAR(vreint_vector, TS1, WS, NS)); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \
++ VECT_VAR(vreint_vector_res, T1, W, N)); \
++ CHECK_FP(TEST_MSG, T1, W, N, PRIx##W, EXPECTED, "");
++
++ DECL_VARIABLE_ALL_VARIANTS(vreint_vector);
++ DECL_VARIABLE(vreint_vector, poly, 64, 1);
++ DECL_VARIABLE(vreint_vector, poly, 64, 2);
++ DECL_VARIABLE_ALL_VARIANTS(vreint_vector_res);
++ DECL_VARIABLE(vreint_vector_res, poly, 64, 1);
++ DECL_VARIABLE(vreint_vector_res, poly, 64, 2);
++
++ clean_results ();
++
++ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vreint_vector, buffer);
++ VLOAD(vreint_vector, buffer, , poly, p, 64, 1);
++ VLOAD(vreint_vector, buffer, q, poly, p, 64, 2);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ VLOAD(vreint_vector, buffer, , float, f, 16, 4);
++ VLOAD(vreint_vector, buffer, q, float, f, 16, 8);
++#endif
++ VLOAD(vreint_vector, buffer, , float, f, 32, 2);
++ VLOAD(vreint_vector, buffer, q, float, f, 32, 4);
++
++ /* vreinterpret_p64_* tests. */
++#undef TEST_MSG
++#define TEST_MSG "VREINTERPRET_P64_*"
++ TEST_VREINTERPRET(, poly, p, 64, 1, int, s, 8, 8, vreint_expected_p64_s8);
++ TEST_VREINTERPRET(, poly, p, 64, 1, int, s, 16, 4, vreint_expected_p64_s16);
++ TEST_VREINTERPRET(, poly, p, 64, 1, int, s, 32, 2, vreint_expected_p64_s32);
++ TEST_VREINTERPRET(, poly, p, 64, 1, int, s, 64, 1, vreint_expected_p64_s64);
++ TEST_VREINTERPRET(, poly, p, 64, 1, uint, u, 8, 8, vreint_expected_p64_u8);
++ TEST_VREINTERPRET(, poly, p, 64, 1, uint, u, 16, 4, vreint_expected_p64_u16);
++ TEST_VREINTERPRET(, poly, p, 64, 1, uint, u, 32, 2, vreint_expected_p64_u32);
++ TEST_VREINTERPRET(, poly, p, 64, 1, uint, u, 64, 1, vreint_expected_p64_u64);
++ TEST_VREINTERPRET(, poly, p, 64, 1, poly, p, 8, 8, vreint_expected_p64_p8);
++ TEST_VREINTERPRET(, poly, p, 64, 1, poly, p, 16, 4, vreint_expected_p64_p16);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(, poly, p, 64, 1, float, f, 16, 4, vreint_expected_p64_f16);
++#endif
++ TEST_VREINTERPRET(, poly, p, 64, 1, float, f, 32, 2, vreint_expected_p64_f32);
++
++ /* vreinterpretq_p64_* tests. */
++#undef TEST_MSG
++#define TEST_MSG "VREINTERPRETQ_P64_*"
++ TEST_VREINTERPRET(q, poly, p, 64, 2, int, s, 8, 16, vreint_expected_q_p64_s8);
++ TEST_VREINTERPRET(q, poly, p, 64, 2, int, s, 16, 8, vreint_expected_q_p64_s16);
++ TEST_VREINTERPRET(q, poly, p, 64, 2, int, s, 32, 4, vreint_expected_q_p64_s32);
++ TEST_VREINTERPRET(q, poly, p, 64, 2, int, s, 64, 2, vreint_expected_q_p64_s64);
++ TEST_VREINTERPRET(q, poly, p, 64, 2, uint, u, 8, 16, vreint_expected_q_p64_u8);
++ TEST_VREINTERPRET(q, poly, p, 64, 2, uint, u, 16, 8, vreint_expected_q_p64_u16);
++ TEST_VREINTERPRET(q, poly, p, 64, 2, uint, u, 32, 4, vreint_expected_q_p64_u32);
++ TEST_VREINTERPRET(q, poly, p, 64, 2, uint, u, 64, 2, vreint_expected_q_p64_u64);
++ TEST_VREINTERPRET(q, poly, p, 64, 2, poly, p, 8, 16, vreint_expected_q_p64_p8);
++ TEST_VREINTERPRET(q, poly, p, 64, 2, poly, p, 16, 8, vreint_expected_q_p64_p16);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET(q, poly, p, 64, 2, float, f, 16, 8, vreint_expected_q_p64_f16);
++#endif
++ TEST_VREINTERPRET(q, poly, p, 64, 2, float, f, 32, 4, vreint_expected_q_p64_f32);
++
++ /* vreinterpret_*_p64 tests. */
++#undef TEST_MSG
++#define TEST_MSG "VREINTERPRET_*_P64"
++
++ TEST_VREINTERPRET(, int, s, 8, 8, poly, p, 64, 1, vreint_expected_s8_p64);
++ TEST_VREINTERPRET(, int, s, 16, 4, poly, p, 64, 1, vreint_expected_s16_p64);
++ TEST_VREINTERPRET(, int, s, 32, 2, poly, p, 64, 1, vreint_expected_s32_p64);
++ TEST_VREINTERPRET(, int, s, 64, 1, poly, p, 64, 1, vreint_expected_s64_p64);
++ TEST_VREINTERPRET(, uint, u, 8, 8, poly, p, 64, 1, vreint_expected_u8_p64);
++ TEST_VREINTERPRET(, uint, u, 16, 4, poly, p, 64, 1, vreint_expected_u16_p64);
++ TEST_VREINTERPRET(, uint, u, 32, 2, poly, p, 64, 1, vreint_expected_u32_p64);
++ TEST_VREINTERPRET(, uint, u, 64, 1, poly, p, 64, 1, vreint_expected_u64_p64);
++ TEST_VREINTERPRET(, poly, p, 8, 8, poly, p, 64, 1, vreint_expected_p8_p64);
++ TEST_VREINTERPRET(, poly, p, 16, 4, poly, p, 64, 1, vreint_expected_p16_p64);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_FP(, float, f, 16, 4, poly, p, 64, 1, vreint_expected_f16_p64);
++#endif
++ TEST_VREINTERPRET_FP(, float, f, 32, 2, poly, p, 64, 1, vreint_expected_f32_p64);
++ TEST_VREINTERPRET(q, int, s, 8, 16, poly, p, 64, 2, vreint_expected_q_s8_p64);
++ TEST_VREINTERPRET(q, int, s, 16, 8, poly, p, 64, 2, vreint_expected_q_s16_p64);
++ TEST_VREINTERPRET(q, int, s, 32, 4, poly, p, 64, 2, vreint_expected_q_s32_p64);
++ TEST_VREINTERPRET(q, int, s, 64, 2, poly, p, 64, 2, vreint_expected_q_s64_p64);
++ TEST_VREINTERPRET(q, uint, u, 8, 16, poly, p, 64, 2, vreint_expected_q_u8_p64);
++ TEST_VREINTERPRET(q, uint, u, 16, 8, poly, p, 64, 2, vreint_expected_q_u16_p64);
++ TEST_VREINTERPRET(q, uint, u, 32, 4, poly, p, 64, 2, vreint_expected_q_u32_p64);
++ TEST_VREINTERPRET(q, uint, u, 64, 2, poly, p, 64, 2, vreint_expected_q_u64_p64);
++ TEST_VREINTERPRET(q, poly, p, 8, 16, poly, p, 64, 2, vreint_expected_q_p8_p64);
++ TEST_VREINTERPRET(q, poly, p, 16, 8, poly, p, 64, 2, vreint_expected_q_p16_p64);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ TEST_VREINTERPRET_FP(q, float, f, 16, 8, poly, p, 64, 2, vreint_expected_q_f16_p64);
++#endif
++ TEST_VREINTERPRET_FP(q, float, f, 32, 4, poly, p, 64, 2, vreint_expected_q_f32_p64);
++
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrev.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrev.c
+@@ -63,6 +63,10 @@ VECT_VAR_DECL(expected_vrev64,uint,32,2) [] = { 0xfffffff1, 0xfffffff0 };
+ VECT_VAR_DECL(expected_vrev64,poly,8,8) [] = { 0xf7, 0xf6, 0xf5, 0xf4,
+ 0xf3, 0xf2, 0xf1, 0xf0 };
+ VECT_VAR_DECL(expected_vrev64,poly,16,4) [] = { 0xfff3, 0xfff2, 0xfff1, 0xfff0 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected_vrev64, hfloat, 16, 4) [] = { 0xca80, 0xcb00,
++ 0xcb80, 0xcc00 };
++#endif
+ VECT_VAR_DECL(expected_vrev64,hfloat,32,2) [] = { 0xc1700000, 0xc1800000 };
+ VECT_VAR_DECL(expected_vrev64,int,8,16) [] = { 0xf7, 0xf6, 0xf5, 0xf4,
+ 0xf3, 0xf2, 0xf1, 0xf0,
+@@ -86,6 +90,12 @@ VECT_VAR_DECL(expected_vrev64,poly,8,16) [] = { 0xf7, 0xf6, 0xf5, 0xf4,
+ 0xfb, 0xfa, 0xf9, 0xf8 };
+ VECT_VAR_DECL(expected_vrev64,poly,16,8) [] = { 0xfff3, 0xfff2, 0xfff1, 0xfff0,
+ 0xfff7, 0xfff6, 0xfff5, 0xfff4 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected_vrev64, hfloat, 16, 8) [] = { 0xca80, 0xcb00,
++ 0xcb80, 0xcc00,
++ 0xc880, 0xc900,
++ 0xc980, 0xca00 };
++#endif
+ VECT_VAR_DECL(expected_vrev64,hfloat,32,4) [] = { 0xc1700000, 0xc1800000,
+ 0xc1500000, 0xc1600000 };
- void
- f4 (int i, ...)
-@@ -74,6 +77,7 @@ f4 (int i, ...)
- /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 16 GPR units and 16 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 24 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 2 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f4: va_list escapes 0, needs to save 24 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+@@ -104,6 +114,10 @@ void exec_vrev (void)
- void
- f5 (int i, ...)
-@@ -88,6 +92,7 @@ f5 (int i, ...)
- /* { dg-final { scan-tree-dump "f5: va_list escapes 0, needs to save 16 GPR units and 0 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 0, needs to save 32 GPR units and 1" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f5: va_list escapes 0, needs to save (4|2) GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f5: va_list escapes 0, needs to save 16 GPR units and 0 FPR units" "stdarg" { target aarch64*-*-* } } } */
+ /* Initialize input "vector" from "buffer". */
+ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (FP16_SUPPORTED)
++ VLOAD (vector, buffer, , float, f, 16, 4);
++ VLOAD (vector, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector, buffer, , float, f, 32, 2);
+ VLOAD(vector, buffer, q, float, f, 32, 4);
- void
- f6 (int i, ...)
-@@ -102,6 +107,7 @@ f6 (int i, ...)
- /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 8 GPR units and 32 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 32 GPR units and 3" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save (3|2) GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f6: va_list escapes 0, needs to save 8 GPR units and 32 FPR units" "stdarg" { target aarch64*-*-* } } } */
+@@ -187,6 +201,12 @@ void exec_vrev (void)
+ CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_vrev64, "");
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_vrev64, "");
- void
- f7 (int i, ...)
-@@ -116,3 +122,4 @@ f7 (int i, ...)
- /* { dg-final { scan-tree-dump "f7: va_list escapes 0, needs to save 0 GPR units and 64 FPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && { ! { ia32 || llp64 } } } } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 0, needs to save 32 GPR units and 2" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "f7: va_list escapes 0, needs to save 2 GPR units and 0 FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "f7: va_list escapes 0, needs to save 0 GPR units and 64 FPR units" "stdarg" { target aarch64*-*-* } } } */
---- a/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-6.c
-+++ b/src/gcc/testsuite/gcc.dg/tree-ssa/stdarg-6.c
-@@ -30,6 +30,7 @@ bar (int x, char const *y, ...)
- /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target { powerpc*-*-linux* && ilp32 } } } } */
- /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
- /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target s390*-*-linux* } } } */
-+/* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target aarch64*-*-* } } } */
- /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
- /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units" "stdarg" { target ia64-*-* } } } */
- /* { dg-final { scan-tree-dump "bar: va_list escapes 1, needs to save all GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
++#if defined (FP16_SUPPORTED)
++ TEST_VREV (, float, f, 16, 4, 64);
++ TEST_VREV (q, float, f, 16, 8, 64);
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx32, expected_vrev64, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx32, expected_vrev64, "");
++#endif
+ TEST_VREV(, float, f, 32, 2, 64);
+ TEST_VREV(q, float, f, 32, 4, 64);
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_vrev64, "");
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.dg/vect/pr57206.c
-@@ -0,0 +1,11 @@
-+/* { dg-do compile } */
-+/* { dg-require-effective-target vect_float } */
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
+@@ -0,0 +1,24 @@
++/* { dg-require-effective-target arm_v8_neon_hw } */
++/* { dg-add-options arm_v8_neon } */
+
-+void bad0(float * d, unsigned int n)
-+{
-+ unsigned int i;
-+ for (i=n; i>0; --i)
-+ d[n-i] = 0.0;
-+}
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
---- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
-@@ -81,7 +81,7 @@ extern size_t strlen(const char *);
- abort(); \
- } \
- } \
-- fprintf(stderr, "CHECKED %s\n", MSG); \
-+ fprintf(stderr, "CHECKED %s %s\n", STR(VECT_TYPE(T, W, N)), MSG); \
- }
-
- /* Floating-point variant. */
-@@ -110,7 +110,7 @@ extern size_t strlen(const char *);
- abort(); \
- } \
- } \
-- fprintf(stderr, "CHECKED %s\n", MSG); \
-+ fprintf(stderr, "CHECKED %s %s\n", STR(VECT_TYPE(T, W, N)), MSG); \
- }
-
- /* Clean buffer with a non-zero pattern to help diagnose buffer
-@@ -133,10 +133,16 @@ static ARRAY(result, uint, 32, 2);
- static ARRAY(result, uint, 64, 1);
- static ARRAY(result, poly, 8, 8);
- static ARRAY(result, poly, 16, 4);
-+#if defined (__ARM_FEATURE_CRYPTO)
-+static ARRAY(result, poly, 64, 1);
-+#endif
- #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
- static ARRAY(result, float, 16, 4);
- #endif
- static ARRAY(result, float, 32, 2);
-+#ifdef __aarch64__
-+static ARRAY(result, float, 64, 1);
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80 };
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80,
++ 0xca00, 0xc980,
++ 0xc900, 0xc880 };
+#endif
- static ARRAY(result, int, 8, 16);
- static ARRAY(result, int, 16, 8);
- static ARRAY(result, int, 32, 4);
-@@ -147,6 +153,9 @@ static ARRAY(result, uint, 32, 4);
- static ARRAY(result, uint, 64, 2);
- static ARRAY(result, poly, 8, 16);
- static ARRAY(result, poly, 16, 8);
-+#if defined (__ARM_FEATURE_CRYPTO)
-+static ARRAY(result, poly, 64, 2);
++VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
++VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
++ 0xc1600000, 0xc1500000 };
++
++#define INSN vrnd
++#define TEST_MSG "VRND"
++
++#include "vrndX.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
+@@ -0,0 +1,63 @@
++#define FNNAME1(NAME) exec_ ## NAME
++#define FNNAME(NAME) FNNAME1 (NAME)
++
++void FNNAME (INSN) (void)
++{
++ /* vector_res = vrndX (vector), then store the result. */
++#define TEST_VRND2(INSN, Q, T1, T2, W, N) \
++ VECT_VAR (vector_res, T1, W, N) = \
++ INSN##Q##_##T2##W (VECT_VAR (vector, T1, W, N)); \
++ vst1##Q##_##T2##W (VECT_VAR (result, T1, W, N), \
++ VECT_VAR (vector_res, T1, W, N))
++
++ /* Two auxliary macros are necessary to expand INSN. */
++#define TEST_VRND1(INSN, Q, T1, T2, W, N) \
++ TEST_VRND2 (INSN, Q, T1, T2, W, N)
++
++#define TEST_VRND(Q, T1, T2, W, N) \
++ TEST_VRND1 (INSN, Q, T1, T2, W, N)
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
+#endif
- #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
- static ARRAY(result, float, 16, 8);
- #endif
-@@ -169,6 +178,7 @@ extern ARRAY(expected, poly, 8, 8);
- extern ARRAY(expected, poly, 16, 4);
- extern ARRAY(expected, hfloat, 16, 4);
- extern ARRAY(expected, hfloat, 32, 2);
-+extern ARRAY(expected, hfloat, 64, 1);
- extern ARRAY(expected, int, 8, 16);
- extern ARRAY(expected, int, 16, 8);
- extern ARRAY(expected, int, 32, 4);
-@@ -335,7 +345,8 @@ extern int VECT_VAR(expected_cumulative_sat, uint, 64, 2);
- strlen(COMMENT) > 0 ? " " COMMENT : ""); \
- abort(); \
- } \
-- fprintf(stderr, "CHECKED CUMULATIVE SAT %s\n", MSG); \
-+ fprintf(stderr, "CHECKED CUMULATIVE SAT %s %s\n", \
-+ STR(VECT_TYPE(T, W, N)), MSG); \
- }
-
- #define CHECK_CUMULATIVE_SAT_NAMED(test_name,EXPECTED,comment) \
---- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h
-@@ -118,6 +118,10 @@ VECT_VAR_DECL_INIT(buffer, uint, 32, 2);
- PAD(buffer_pad, uint, 32, 2);
- VECT_VAR_DECL_INIT(buffer, uint, 64, 1);
- PAD(buffer_pad, uint, 64, 1);
-+#if defined (__ARM_FEATURE_CRYPTO)
-+VECT_VAR_DECL_INIT(buffer, poly, 64, 1);
-+PAD(buffer_pad, poly, 64, 1);
++ DECL_VARIABLE (vector, float, 32, 2);
++ DECL_VARIABLE (vector, float, 32, 4);
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
+#endif
- #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
- VECT_VAR_DECL_INIT(buffer, float, 16, 4);
- PAD(buffer_pad, float, 16, 4);
-@@ -144,6 +148,10 @@ VECT_VAR_DECL_INIT(buffer, poly, 8, 16);
- PAD(buffer_pad, poly, 8, 16);
- VECT_VAR_DECL_INIT(buffer, poly, 16, 8);
- PAD(buffer_pad, poly, 16, 8);
-+#if defined (__ARM_FEATURE_CRYPTO)
-+VECT_VAR_DECL_INIT(buffer, poly, 64, 2);
-+PAD(buffer_pad, poly, 64, 2);
++ DECL_VARIABLE (vector_res, float, 32, 2);
++ DECL_VARIABLE (vector_res, float, 32, 4);
++
++ clean_results ();
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VLOAD (vector, buffer, , float, f, 16, 4);
++ VLOAD (vector, buffer, q, float, f, 16, 8);
+#endif
- #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
- VECT_VAR_DECL_INIT(buffer, float, 16, 8);
- PAD(buffer_pad, float, 16, 8);
-@@ -178,6 +186,10 @@ VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 8);
- VECT_VAR_DECL(buffer_dup_pad, poly, 8, 8);
- VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 4);
- VECT_VAR_DECL(buffer_dup_pad, poly, 16, 4);
-+#if defined (__ARM_FEATURE_CRYPTO)
-+VECT_VAR_DECL_INIT4(buffer_dup, poly, 64, 1);
-+VECT_VAR_DECL(buffer_dup_pad, poly, 64, 1);
++ VLOAD (vector, buffer, , float, f, 32, 2);
++ VLOAD (vector, buffer, q, float, f, 32, 4);
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRND ( , float, f, 16, 4);
++ TEST_VRND (q, float, f, 16, 8);
+#endif
- #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
- VECT_VAR_DECL_INIT4(buffer_dup, float, 16, 4);
- VECT_VAR_DECL(buffer_dup_pad, float, 16, 4);
-@@ -205,6 +217,10 @@ VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 16);
- VECT_VAR_DECL(buffer_dup_pad, poly, 8, 16);
- VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 8);
- VECT_VAR_DECL(buffer_dup_pad, poly, 16, 8);
-+#if defined (__ARM_FEATURE_CRYPTO)
-+VECT_VAR_DECL_INIT4(buffer_dup, poly, 64, 2);
-+VECT_VAR_DECL(buffer_dup_pad, poly, 64, 2);
++ TEST_VRND ( , float, f, 32, 2);
++ TEST_VRND (q, float, f, 32, 4);
++
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, "");
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected, "");
+#endif
- #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
- VECT_VAR_DECL_INIT(buffer_dup, float, 16, 8);
- VECT_VAR_DECL(buffer_dup_pad, float, 16, 8);
++ CHECK_FP (TEST_MSG, float, 32, 2, PRIx32, expected, "");
++ CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
++}
++
++int
++main (void)
++{
++ FNNAME (INSN) ();
++ return 0;
++}
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
-@@ -0,0 +1,663 @@
-+/* This file contains tests for all the *p64 intrinsics, except for
-+ vreinterpret which have their own testcase. */
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
+@@ -0,0 +1,25 @@
++/* { dg-require-effective-target arm_v8_neon_hw } */
++/* { dg-add-options arm_v8_neon } */
+
-+/* { dg-require-effective-target arm_crypto_ok } */
-+/* { dg-add-options arm_crypto } */
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++/* Expected results. */
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80 };
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80,
++ 0xca00, 0xc980,
++ 0xc900, 0xc880 };
++#endif
++VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
++VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
++ 0xc1600000, 0xc1500000 };
++
++#define INSN vrnda
++#define TEST_MSG "VRNDA"
++
++#include "vrndX.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x4000 /* 2.000000 */,
++ 0x4200 /* 3.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0xc000 /* -2.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0xc800 /* -8.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x4a80 /* 13.000000 */,
++ 0xc600 /* -6.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x7c00 /* inf */,
++ 0xfc00 /* -inf */
++};
++
++#define TEST_MSG "VRNDAH_F16"
++#define INSN_NAME vrndah_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x4000 /* 2.000000 */,
++ 0x4200 /* 3.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0xc000 /* -2.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0xc700 /* -7.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x4a80 /* 13.000000 */,
++ 0xc600 /* -6.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x7c00 /* inf */,
++ 0xfc00 /* -inf */
++};
++
++#define TEST_MSG "VRNDH_F16"
++#define INSN_NAME vrndh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndi_f16_1.c
+@@ -0,0 +1,71 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
-+/* Expected results: vbsl. */
-+VECT_VAR_DECL(vbsl_expected,poly,64,1) [] = { 0xfffffff1 };
-+VECT_VAR_DECL(vbsl_expected,poly,64,2) [] = { 0xfffffff1,
-+ 0xfffffff1 };
++#define FP16_C(a) ((__fp16) a)
++#define A FP16_C (123.4)
++#define RNDI_A 0x57B0 /* FP16_C (123). */
++#define B FP16_C (-567.5)
++#define RNDI_B 0xE070 /* FP16_C (-568). */
++#define C FP16_C (-34.8)
++#define RNDI_C 0xD060 /* FP16_C (-35). */
++#define D FP16_C (1024)
++#define RNDI_D 0x6400 /* FP16_C (1024). */
++#define E FP16_C (663.1)
++#define RNDI_E 0x612E /* FP16_C (663). */
++#define F FP16_C (169.1)
++#define RNDI_F 0x5948 /* FP16_C (169). */
++#define G FP16_C (-4.8)
++#define RNDI_G 0xC500 /* FP16_C (-5). */
++#define H FP16_C (77.5)
++#define RNDI_H 0x54E0 /* FP16_C (78). */
++
++/* Expected results for vrndi. */
++VECT_VAR_DECL (expected_static, hfloat, 16, 4) []
++ = { RNDI_A, RNDI_B, RNDI_C, RNDI_D };
++
++VECT_VAR_DECL (expected_static, hfloat, 16, 8) []
++ = { RNDI_A, RNDI_B, RNDI_C, RNDI_D, RNDI_E, RNDI_F, RNDI_G, RNDI_H };
++
++void exec_vrndi_f16 (void)
++{
++#undef TEST_MSG
++#define TEST_MSG "VRNDI (FP16)"
++ clean_results ();
+
-+/* Expected results: vceq. */
-+VECT_VAR_DECL(vceq_expected,uint,64,1) [] = { 0x0 };
++ DECL_VARIABLE(vsrc, float, 16, 4);
++ VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
++ VLOAD (vsrc, buf_src, , float, f, 16, 4);
++ DECL_VARIABLE (vector_res, float, 16, 4)
++ = vrndi_f16 (VECT_VAR (vsrc, float, 16, 4));
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
+
-+/* Expected results: vcombine. */
-+VECT_VAR_DECL(vcombine_expected,poly,64,2) [] = { 0xfffffffffffffff0, 0x88 };
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
-+/* Expected results: vcreate. */
-+VECT_VAR_DECL(vcreate_expected,poly,64,1) [] = { 0x123456789abcdef0 };
++#undef TEST_MSG
++#define TEST_MSG "VRNDIQ (FP16)"
++ clean_results ();
+
-+/* Expected results: vdup_lane. */
-+VECT_VAR_DECL(vdup_lane_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vdup_lane_expected,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff0 };
++ DECL_VARIABLE(vsrc, float, 16, 8);
++ VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
++ VLOAD (vsrc, buf_src, q, float, f, 16, 8);
++ DECL_VARIABLE (vector_res, float, 16, 8)
++ = vrndiq_f16 (VECT_VAR (vsrc, float, 16, 8));
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
-+/* Expected results: vdup_n. */
-+VECT_VAR_DECL(vdup_n_expected0,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vdup_n_expected0,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vdup_n_expected1,poly,64,1) [] = { 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vdup_n_expected1,poly,64,2) [] = { 0xfffffffffffffff1,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vdup_n_expected2,poly,64,1) [] = { 0xfffffffffffffff2 };
-+VECT_VAR_DECL(vdup_n_expected2,poly,64,2) [] = { 0xfffffffffffffff2,
-+ 0xfffffffffffffff2 };
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
++}
+
-+/* Expected results: vext. */
-+VECT_VAR_DECL(vext_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vext_expected,poly,64,2) [] = { 0xfffffffffffffff1, 0x88 };
++int
++main (void)
++{
++ exec_vrndi_f16 ();
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
-+/* Expected results: vget_low. */
-+VECT_VAR_DECL(vget_low_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
++#include <arm_fp16.h>
+
-+/* Expected results: vld1. */
-+VECT_VAR_DECL(vld1_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vld1_expected,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x4000 /* 2.000000 */,
++ 0x4200 /* 3.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0xc000 /* -2.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0xc800 /* -8.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x4a80 /* 13.000000 */,
++ 0xc600 /* -6.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x7c00 /* inf */,
++ 0xfc00 /* -inf */
++};
+
-+/* Expected results: vld1_dup. */
-+VECT_VAR_DECL(vld1_dup_expected0,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vld1_dup_expected0,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vld1_dup_expected1,poly,64,1) [] = { 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vld1_dup_expected1,poly,64,2) [] = { 0xfffffffffffffff1,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vld1_dup_expected2,poly,64,1) [] = { 0xfffffffffffffff2 };
-+VECT_VAR_DECL(vld1_dup_expected2,poly,64,2) [] = { 0xfffffffffffffff2,
-+ 0xfffffffffffffff2 };
++#define TEST_MSG "VRNDIH_F16"
++#define INSN_NAME vrndih_f16
+
-+/* Expected results: vld1_lane. */
-+VECT_VAR_DECL(vld1_lane_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vld1_lane_expected,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xaaaaaaaaaaaaaaaa };
++#define EXPECTED expected
+
-+/* Expected results: vldX. */
-+VECT_VAR_DECL(vld2_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vld2_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vld3_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vld3_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vld3_expected_2,poly,64,1) [] = { 0xfffffffffffffff2 };
-+VECT_VAR_DECL(vld4_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vld4_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vld4_expected_2,poly,64,1) [] = { 0xfffffffffffffff2 };
-+VECT_VAR_DECL(vld4_expected_3,poly,64,1) [] = { 0xfffffffffffffff3 };
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
+
-+/* Expected results: vldX_dup. */
-+VECT_VAR_DECL(vld2_dup_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vld2_dup_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vld3_dup_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vld3_dup_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vld3_dup_expected_2,poly,64,1) [] = { 0xfffffffffffffff2 };
-+VECT_VAR_DECL(vld4_dup_expected_0,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vld4_dup_expected_1,poly,64,1) [] = { 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vld4_dup_expected_2,poly,64,1) [] = { 0xfffffffffffffff2 };
-+VECT_VAR_DECL(vld4_dup_expected_3,poly,64,1) [] = { 0xfffffffffffffff3 };
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
+@@ -0,0 +1,25 @@
++/* { dg-require-effective-target arm_v8_neon_hw } */
++/* { dg-add-options arm_v8_neon } */
+
-+/* Expected results: vsli. */
-+VECT_VAR_DECL(vsli_expected,poly,64,1) [] = { 0x10 };
-+VECT_VAR_DECL(vsli_expected,poly,64,2) [] = { 0x7ffffffffffff0,
-+ 0x7ffffffffffff1 };
-+VECT_VAR_DECL(vsli_expected_max_shift,poly,64,1) [] = { 0x7ffffffffffffff0 };
-+VECT_VAR_DECL(vsli_expected_max_shift,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+/* Expected results: vsri. */
-+VECT_VAR_DECL(vsri_expected,poly,64,1) [] = { 0xe000000000000000 };
-+VECT_VAR_DECL(vsri_expected,poly,64,2) [] = { 0xfffffffffffff800,
-+ 0xfffffffffffff800 };
-+VECT_VAR_DECL(vsri_expected_max_shift,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vsri_expected_max_shift,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
++/* Expected results. */
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80 };
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80,
++ 0xca00, 0xc980,
++ 0xc900, 0xc880 };
++#endif
++VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
++VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
++ 0xc1600000, 0xc1500000 };
+
-+/* Expected results: vst1_lane. */
-+VECT_VAR_DECL(vst1_lane_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vst1_lane_expected,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0x3333333333333333 };
++#define INSN vrndm
++#define TEST_MSG "VRNDM"
++
++#include "vrndX.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++#include <arm_fp16.h>
++
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x4000 /* 2.000000 */,
++ 0x4200 /* 3.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0xc200 /* -3.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0xc800 /* -8.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x4a80 /* 13.000000 */,
++ 0xc700 /* -7.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x7c00 /* inf */,
++ 0xfc00 /* -inf */
++};
++
++#define TEST_MSG "VRNDMH_F16"
++#define INSN_NAME vrndmh_f16
++
++#define EXPECTED expected
+
-+int main (void)
-+{
-+ int i;
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
+
-+ /* vbsl_p64 tests. */
-+#define TEST_MSG "VBSL/VBSLQ"
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
+@@ -0,0 +1,25 @@
++/* { dg-require-effective-target arm_v8_neon_hw } */
++/* { dg-add-options arm_v8_neon } */
+
-+#define TEST_VBSL(T3, Q, T1, T2, W, N) \
-+ VECT_VAR(vbsl_vector_res, T1, W, N) = \
-+ vbsl##Q##_##T2##W(VECT_VAR(vbsl_vector_first, T3, W, N), \
-+ VECT_VAR(vbsl_vector, T1, W, N), \
-+ VECT_VAR(vbsl_vector2, T1, W, N)); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vbsl_vector_res, T1, W, N))
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+ DECL_VARIABLE(vbsl_vector, poly, 64, 1);
-+ DECL_VARIABLE(vbsl_vector, poly, 64, 2);
-+ DECL_VARIABLE(vbsl_vector2, poly, 64, 1);
-+ DECL_VARIABLE(vbsl_vector2, poly, 64, 2);
-+ DECL_VARIABLE(vbsl_vector_res, poly, 64, 1);
-+ DECL_VARIABLE(vbsl_vector_res, poly, 64, 2);
++/* Expected results. */
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80 };
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80,
++ 0xca00, 0xc980,
++ 0xc900, 0xc880 };
++#endif
++VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
++VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
++ 0xc1600000, 0xc1500000 };
+
-+ DECL_VARIABLE(vbsl_vector_first, uint, 64, 1);
-+ DECL_VARIABLE(vbsl_vector_first, uint, 64, 2);
++#define INSN vrndn
++#define TEST_MSG "VRNDN"
+
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++#include "vrndX.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
-+ VLOAD(vbsl_vector, buffer, , poly, p, 64, 1);
-+ VLOAD(vbsl_vector, buffer, q, poly, p, 64, 2);
++#include <arm_fp16.h>
+
-+ VDUP(vbsl_vector2, , poly, p, 64, 1, 0xFFFFFFF3);
-+ VDUP(vbsl_vector2, q, poly, p, 64, 2, 0xFFFFFFF3);
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x4000 /* 2.000000 */,
++ 0x4200 /* 3.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0xc000 /* -2.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0xc800 /* -8.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x4a80 /* 13.000000 */,
++ 0xc600 /* -6.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x7c00 /* inf */,
++ 0xfc00 /* -inf */
++};
+
-+ VDUP(vbsl_vector_first, , uint, u, 64, 1, 0xFFFFFFF2);
-+ VDUP(vbsl_vector_first, q, uint, u, 64, 2, 0xFFFFFFF2);
++#define TEST_MSG "VRNDNH_F16"
++#define INSN_NAME vrndnh_f16
+
-+ TEST_VBSL(uint, , poly, p, 64, 1);
-+ TEST_VBSL(uint, q, poly, p, 64, 2);
++#define EXPECTED expected
+
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vbsl_expected, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vbsl_expected, "");
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
+
-+ /* vceq_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VCEQ"
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c
+@@ -0,0 +1,24 @@
++/* { dg-require-effective-target arm_v8_neon_hw } */
++/* { dg-add-options arm_v8_neon } */
+
-+#define TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N) \
-+ VECT_VAR(vceq_vector_res, T3, W, N) = \
-+ INSN##Q##_##T2##W(VECT_VAR(vceq_vector, T1, W, N), \
-+ VECT_VAR(vceq_vector2, T1, W, N)); \
-+ vst1##Q##_u##W(VECT_VAR(result, T3, W, N), VECT_VAR(vceq_vector_res, T3, W, N))
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+#define TEST_VCOMP(INSN, Q, T1, T2, T3, W, N) \
-+ TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N)
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80 };
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80,
++ 0xca00, 0xc980,
++ 0xc900, 0xc880 };
++#endif
++VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
++VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
++ 0xc1600000, 0xc1500000 };
+
-+ DECL_VARIABLE(vceq_vector, poly, 64, 1);
-+ DECL_VARIABLE(vceq_vector2, poly, 64, 1);
-+ DECL_VARIABLE(vceq_vector_res, uint, 64, 1);
++#define INSN vrndp
++#define TEST_MSG "VRNDP"
+
-+ CLEAN(result, uint, 64, 1);
++#include "vrndX.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
-+ VLOAD(vceq_vector, buffer, , poly, p, 64, 1);
++#include <arm_fp16.h>
+
-+ VDUP(vceq_vector2, , poly, p, 64, 1, 0x88);
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x4000 /* 2.000000 */,
++ 0x4400 /* 4.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0xc000 /* -2.000000 */,
++ 0x4000 /* 2.000000 */,
++ 0xc700 /* -7.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x4b00 /* 14.000000 */,
++ 0xc600 /* -6.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x7c00 /* inf */,
++ 0xfc00 /* -inf */
++};
+
-+ TEST_VCOMP(vceq, , poly, p, uint, 64, 1);
++#define TEST_MSG "VRNDPH_F16"
++#define INSN_NAME vrndph_f16
+
-+ CHECK(TEST_MSG, uint, 64, 1, PRIx64, vceq_expected, "");
++#define EXPECTED expected
+
-+ /* vcombine_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VCOMBINE"
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
+
-+#define TEST_VCOMBINE(T1, T2, W, N, N2) \
-+ VECT_VAR(vcombine_vector128, T1, W, N2) = \
-+ vcombine_##T2##W(VECT_VAR(vcombine_vector64_a, T1, W, N), \
-+ VECT_VAR(vcombine_vector64_b, T1, W, N)); \
-+ vst1q_##T2##W(VECT_VAR(result, T1, W, N2), VECT_VAR(vcombine_vector128, T1, W, N2))
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c
+@@ -0,0 +1,24 @@
++/* { dg-require-effective-target arm_v8_neon_hw } */
++/* { dg-add-options arm_v8_neon } */
+
-+ DECL_VARIABLE(vcombine_vector64_a, poly, 64, 1);
-+ DECL_VARIABLE(vcombine_vector64_b, poly, 64, 1);
-+ DECL_VARIABLE(vcombine_vector128, poly, 64, 2);
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+ CLEAN(result, poly, 64, 2);
++/* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80 };
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80,
++ 0xca00, 0xc980,
++ 0xc900, 0xc880 };
++#endif
++VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
++VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
++ 0xc1600000, 0xc1500000 };
+
-+ VLOAD(vcombine_vector64_a, buffer, , poly, p, 64, 1);
++#define INSN vrndx
++#define TEST_MSG "VRNDX"
+
-+ VDUP(vcombine_vector64_b, , poly, p, 64, 1, 0x88);
++#include "vrndX.inc"
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
-+ TEST_VCOMBINE(poly, p, 64, 1, 2);
++#include <arm_fp16.h>
+
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx16, vcombine_expected, "");
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x4000 /* 2.000000 */,
++ 0x4200 /* 3.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0xc000 /* -2.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0xc800 /* -8.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x0000 /* 0.000000 */,
++ 0x3c00 /* 1.000000 */,
++ 0x4a80 /* 13.000000 */,
++ 0xc600 /* -6.000000 */,
++ 0x4d00 /* 20.000000 */,
++ 0x7c00 /* inf */,
++ 0xfc00 /* -inf */
++};
+
-+ /* vcreate_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VCREATE"
++#define TEST_MSG "VRNDNH_F16"
++#define INSN_NAME vrndnh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrte.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrte.c
+@@ -7,6 +7,11 @@
+ VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffffffff, 0xffffffff };
+ VECT_VAR_DECL(expected,uint,32,4) [] = { 0x9c800000, 0x9c800000,
+ 0x9c800000, 0x9c800000 };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0x324c, 0x324c, 0x324c, 0x324c };
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0x3380, 0x3380, 0x3380, 0x3380,
++ 0x3380, 0x3380, 0x3380, 0x3380 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x3e498000, 0x3e498000 };
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x3e700000, 0x3e700000,
+ 0x3e700000, 0x3e700000 };
+@@ -22,17 +27,39 @@ VECT_VAR_DECL(expected_2,uint,32,4) [] = { 0xed000000, 0xed000000,
+ 0xed000000, 0xed000000 };
+
+ /* Expected results with FP special inputs values (NaNs, ...). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp1, hfloat, 16, 4) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++VECT_VAR_DECL(expected_fp1, hfloat, 16, 8) [] = { 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00,
++ 0x7c00, 0x7c00 };
++#endif
+ VECT_VAR_DECL(expected_fp1,hfloat,32,2) [] = { 0x7fc00000, 0x7fc00000 };
+ VECT_VAR_DECL(expected_fp1,hfloat,32,4) [] = { 0x7f800000, 0x7f800000,
+ 0x7f800000, 0x7f800000 };
+
+ /* Expected results with FP special inputs values
+ (negative, infinity). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp2, hfloat, 16, 4) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++VECT_VAR_DECL(expected_fp2, hfloat, 16, 8) [] = { 0x0, 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0 };
++#endif
+ VECT_VAR_DECL(expected_fp2,hfloat,32,2) [] = { 0x7fc00000, 0x7fc00000 };
+ VECT_VAR_DECL(expected_fp2,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+ /* Expected results with FP special inputs values
+ (-0, -infinity). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp3, hfloat, 16, 4) [] = { 0xfc00, 0xfc00,
++ 0xfc00, 0xfc00 };
++VECT_VAR_DECL(expected_fp3, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++#endif
+ VECT_VAR_DECL(expected_fp3,hfloat,32,2) [] = { 0xff800000, 0xff800000 };
+ VECT_VAR_DECL(expected_fp3,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
+ 0x7fc00000, 0x7fc00000 };
+@@ -50,32 +77,60 @@ void exec_vrsqrte(void)
+ VECT_VAR(vector_res, T1, W, N))
+
+ DECL_VARIABLE(vector, uint, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++#endif
+ DECL_VARIABLE(vector, float, 32, 2);
+ DECL_VARIABLE(vector, uint, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector, float, 32, 4);
+
+ DECL_VARIABLE(vector_res, uint, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector_res, float, 16, 4);
++#endif
+ DECL_VARIABLE(vector_res, float, 32, 2);
+ DECL_VARIABLE(vector_res, uint, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector_res, float, 32, 4);
+
+ clean_results ();
+
+ /* Choose init value arbitrarily. */
+ VDUP(vector, , uint, u, 32, 2, 0x12345678);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, 25.799999f);
++#endif
+ VDUP(vector, , float, f, 32, 2, 25.799999f);
+ VDUP(vector, q, uint, u, 32, 4, 0xABCDEF10);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, q, float, f, 16, 8, 18.2f);
++#endif
+ VDUP(vector, q, float, f, 32, 4, 18.2f);
+
+ /* Apply the operator. */
+ TEST_VRSQRTE(, uint, u, 32, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRSQRTE(, float, f, 16, 4);
++#endif
+ TEST_VRSQRTE(, float, f, 32, 2);
+ TEST_VRSQRTE(q, uint, u, 32, 4);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRSQRTE(q, float, f, 16, 8);
++#endif
+ TEST_VRSQRTE(q, float, f, 32, 4);
+
+ #define CMT ""
+ CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, CMT);
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, CMT);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, CMT);
+
+@@ -110,42 +165,78 @@ void exec_vrsqrte(void)
+
+
+ /* Test FP variants with special input values (NaNs, ...). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, NAN);
++ VDUP(vector, q, float, f, 16, 8, 0.0f);
++#endif
+ VDUP(vector, , float, f, 32, 2, NAN);
+ VDUP(vector, q, float, f, 32, 4, 0.0f);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRSQRTE(, float, f, 16, 4);
++ TEST_VRSQRTE(q, float, f, 16, 8);
++#endif
+ TEST_VRSQRTE(, float, f, 32, 2);
+ TEST_VRSQRTE(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (NaN, 0)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp1, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp1, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp1, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp1, CMT);
+
+
+ /* Test FP variants with special input values (negative, infinity). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, -1.0f);
++ VDUP(vector, q, float, f, 16, 8, HUGE_VALF);
++#endif
+ VDUP(vector, , float, f, 32, 2, -1.0f);
+ VDUP(vector, q, float, f, 32, 4, HUGE_VALF);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRSQRTE(, float, f, 16, 4);
++ TEST_VRSQRTE(q, float, f, 16, 8);
++#endif
+ TEST_VRSQRTE(, float, f, 32, 2);
+ TEST_VRSQRTE(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (negative, infinity)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp2, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp2, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp2, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp2, CMT);
+
+ /* Test FP variants with special input values (-0, -infinity). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, -0.0f);
++ VDUP(vector, q, float, f, 16, 8, -HUGE_VALF);
++#endif
+ VDUP(vector, , float, f, 32, 2, -0.0f);
+ VDUP(vector, q, float, f, 32, 4, -HUGE_VALF);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRSQRTE(, float, f, 16, 4);
++ TEST_VRSQRTE(q, float, f, 16, 8);
++#endif
+ TEST_VRSQRTE(, float, f, 32, 2);
+ TEST_VRSQRTE(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (-0, -infinity)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp3, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp3, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp3, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp3, CMT);
+ }
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrteh_f16_1.c
+@@ -0,0 +1,30 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++float16_t input[] = { 123.4, 67.8, 34.8, 24.0, 66.1, 144.0, 4.8, 77.0 };
++uint16_t expected[] = { 0x2DC4 /* FP16_C (1/__builtin_sqrtf (123.4)). */,
++ 0x2FC8 /* FP16_C (1/__builtin_sqrtf (67.8)). */,
++ 0x316C /* FP16_C (1/__builtin_sqrtf (34.8)). */,
++ 0x3288 /* FP16_C (1/__builtin_sqrtf (24.0)). */,
++ 0x2FDC /* FP16_C (1/__builtin_sqrtf (66.1)). */,
++ 0x2D54 /* FP16_C (1/__builtin_sqrtf (144.0)). */,
++ 0x3750 /* FP16_C (1/__builtin_sqrtf (4.8)). */,
++ 0x2F48 /* FP16_C (1/__builtin_sqrtf (77.0)). */ };
++
++#define TEST_MSG "VRSQRTEH_F16"
++#define INSN_NAME vrsqrteh_f16
++
++#define INPUT input
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrts.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrts.c
+@@ -4,22 +4,51 @@
+ #include <math.h>
+
+ /* Expected results. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected, hfloat, 16, 4) [] = { 0xd3cb, 0xd3cb, 0xd3cb, 0xd3cb };
++VECT_VAR_DECL(expected, hfloat, 16, 8) [] = { 0xc726, 0xc726, 0xc726, 0xc726,
++ 0xc726, 0xc726, 0xc726, 0xc726 };
++#endif
+ VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc2796b84, 0xc2796b84 };
+ VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc0e4a3d8, 0xc0e4a3d8,
+ 0xc0e4a3d8, 0xc0e4a3d8 };
+
+ /* Expected results with input=NaN. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_nan, hfloat, 16, 4) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++VECT_VAR_DECL(expected_nan, hfloat, 16, 8) [] = { 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00,
++ 0x7e00, 0x7e00 };
++#endif
+ VECT_VAR_DECL(expected_nan,hfloat,32,2) [] = { 0x7fc00000, 0x7fc00000 };
+ VECT_VAR_DECL(expected_nan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000,
+ 0x7fc00000, 0x7fc00000 };
+
+ /* Expected results with FP special inputs values (infinity, 0). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp1, hfloat, 16, 4) [] = { 0xfc00, 0xfc00,
++ 0xfc00, 0xfc00 };
++VECT_VAR_DECL(expected_fp1, hfloat, 16, 8) [] = { 0x3e00, 0x3e00,
++ 0x3e00, 0x3e00,
++ 0x3e00, 0x3e00,
++ 0x3e00, 0x3e00 };
++#endif
+ VECT_VAR_DECL(expected_fp1,hfloat,32,2) [] = { 0xff800000, 0xff800000 };
+ VECT_VAR_DECL(expected_fp1,hfloat,32,4) [] = { 0x3fc00000, 0x3fc00000,
+ 0x3fc00000, 0x3fc00000 };
+
+ /* Expected results with only FP special inputs values (infinity,
+ 0). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_fp2, hfloat, 16, 4) [] = { 0x3e00, 0x3e00,
++ 0x3e00, 0x3e00 };
++VECT_VAR_DECL(expected_fp2, hfloat, 16, 8) [] = { 0x3e00, 0x3e00,
++ 0x3e00, 0x3e00,
++ 0x3e00, 0x3e00,
++ 0x3e00, 0x3e00 };
++#endif
+ VECT_VAR_DECL(expected_fp2,hfloat,32,2) [] = { 0x3fc00000, 0x3fc00000 };
+ VECT_VAR_DECL(expected_fp2,hfloat,32,4) [] = { 0x3fc00000, 0x3fc00000,
+ 0x3fc00000, 0x3fc00000 };
+@@ -38,75 +67,143 @@ void exec_vrsqrts(void)
+ VECT_VAR(vector_res, T1, W, N))
+
+ /* No need for integer variants. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector, float, 32, 2);
+ DECL_VARIABLE(vector, float, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector2, float, 16, 4);
++ DECL_VARIABLE(vector2, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector2, float, 32, 2);
+ DECL_VARIABLE(vector2, float, 32, 4);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
++#endif
+ DECL_VARIABLE(vector_res, float, 32, 2);
+ DECL_VARIABLE(vector_res, float, 32, 4);
+
+ clean_results ();
+
+ /* Choose init value arbitrarily. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, 12.9f);
++ VDUP(vector, q, float, f, 16, 8, 9.1f);
++#endif
+ VDUP(vector, , float, f, 32, 2, 12.9f);
+ VDUP(vector, q, float, f, 32, 4, 9.1f);
+
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector2, , float, f, 16, 4, 9.9f);
++ VDUP(vector2, q, float, f, 16, 8, 1.9f);
++#endif
+ VDUP(vector2, , float, f, 32, 2, 9.9f);
+ VDUP(vector2, q, float, f, 32, 4, 1.9f);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRSQRTS(, float, f, 16, 4);
++ TEST_VRSQRTS(q, float, f, 16, 8);
++#endif
+ TEST_VRSQRTS(, float, f, 32, 2);
+ TEST_VRSQRTS(q, float, f, 32, 4);
+
+ #define CMT ""
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, CMT);
+
+
+ /* Test FP variants with special input values (NaN). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, NAN);
++ VDUP(vector2, q, float, f, 16, 8, NAN);
++#endif
+ VDUP(vector, , float, f, 32, 2, NAN);
+ VDUP(vector2, q, float, f, 32, 4, NAN);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRSQRTS(, float, f, 16, 4);
++ TEST_VRSQRTS(q, float, f, 16, 8);
++#endif
+ TEST_VRSQRTS(, float, f, 32, 2);
+ TEST_VRSQRTS(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (NAN) and normal values"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_nan, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_nan, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_nan, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_nan, CMT);
+
+
+ /* Test FP variants with special input values (infinity, 0). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, HUGE_VALF);
++ VDUP(vector, q, float, f, 16, 8, 0.0f);
++ /* Restore a normal value in vector2. */
++ VDUP(vector2, q, float, f, 16, 8, 3.2f);
++#endif
+ VDUP(vector, , float, f, 32, 2, HUGE_VALF);
+ VDUP(vector, q, float, f, 32, 4, 0.0f);
+ /* Restore a normal value in vector2. */
+ VDUP(vector2, q, float, f, 32, 4, 3.2f);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRSQRTS(, float, f, 16, 4);
++ TEST_VRSQRTS(q, float, f, 16, 8);
++#endif
+ TEST_VRSQRTS(, float, f, 32, 2);
+ TEST_VRSQRTS(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " FP special (infinity, 0) and normal values"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp1, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp1, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp1, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp1, CMT);
+
+
+ /* Test FP variants with only special input values (infinity, 0). */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ VDUP(vector, , float, f, 16, 4, HUGE_VALF);
++ VDUP(vector, q, float, f, 16, 8, 0.0f);
++ VDUP(vector2, , float, f, 16, 4, -0.0f);
++ VDUP(vector2, q, float, f, 16, 8, HUGE_VALF);
++#endif
+ VDUP(vector, , float, f, 32, 2, HUGE_VALF);
+ VDUP(vector, q, float, f, 32, 4, 0.0f);
+ VDUP(vector2, , float, f, 32, 2, -0.0f);
+ VDUP(vector2, q, float, f, 32, 4, HUGE_VALF);
+
+ /* Apply the operator. */
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ TEST_VRSQRTS(, float, f, 16, 4);
++ TEST_VRSQRTS(q, float, f, 16, 8);
++#endif
+ TEST_VRSQRTS(, float, f, 32, 2);
+ TEST_VRSQRTS(q, float, f, 32, 4);
+
+ #undef CMT
+ #define CMT " only FP special (infinity, 0)"
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_fp2, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_fp2, CMT);
++#endif
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_fp2, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_fp2, CMT);
+ }
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrsqrtsh_f16_1.c
+@@ -0,0 +1,50 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++/* { dg-skip-if "" { arm*-*-* } } */
++
++#include <arm_fp16.h>
++
++/* Input values. */
++#define A 12.4
++#define B -5.8
++#define C -3.8
++#define D 10
++#define E 66.1
++#define F 16.1
++#define G -4.8
++#define H -77
++
++#define I 0.7
++#define J -78
++#define K 10.23
++#define L 98
++#define M 87
++#define N -87.81
++#define O -1.1
++#define P 47.8
++
++float16_t input_1[] = { A, B, C, D, I, J, K, L };
++float16_t input_2[] = { E, F, G, H, M, N, O, P };
++uint16_t expected[] = { 0xDE62 /* (3.0f + (-A) * E) / 2.0f. */,
++ 0x5206 /* (3.0f + (-B) * F) / 2.0f. */,
++ 0xC7A0 /* (3.0f + (-C) * G) / 2.0f. */,
++ 0x5E0A /* (3.0f + (-D) * H) / 2.0f. */,
++ 0xCF3D /* (3.0f + (-I) * M) / 2.0f. */,
++ 0xEAB0 /* (3.0f + (-J) * N) / 2.0f. */,
++ 0x471F /* (3.0f + (-K) * O) / 2.0f. */,
++ 0xE893 /* (3.0f + (-L) * P) / 2.0f. */ };
++
++#define TEST_MSG "VRSQRTSH_F16"
++#define INSN_NAME vrsqrtsh_f16
++
++#define INPUT_1 input_1
++#define INPUT_2 input_2
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for unary scalar operations. */
++#include "binary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshl.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshl.c
+@@ -101,10 +101,8 @@ VECT_VAR_DECL(expected_negative_shift,uint,64,2) [] = { 0x7ffffffffffffff,
+ 0x7ffffffffffffff };
+
+
+-#ifndef INSN_NAME
+ #define INSN_NAME vshl
+ #define TEST_MSG "VSHL/VSHLQ"
+-#endif
+
+ #define FNNAME1(NAME) exec_ ## NAME
+ #define FNNAME(NAME) FNNAME1(NAME)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
+@@ -53,9 +53,17 @@ void FNNAME (INSN_NAME) (void)
+ DECL_VSHUFFLE(float, 32, 4)
+
+ DECL_ALL_VSHUFFLE();
++#if defined (FP16_SUPPORTED)
++ DECL_VSHUFFLE (float, 16, 4);
++ DECL_VSHUFFLE (float, 16, 8);
++#endif
+
+ /* Initialize input "vector" from "buffer". */
+ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector1, buffer);
++#if defined (FP16_SUPPORTED)
++ VLOAD (vector1, buffer, , float, f, 16, 4);
++ VLOAD (vector1, buffer, q, float, f, 16, 8);
++#endif
+ VLOAD(vector1, buffer, , float, f, 32, 2);
+ VLOAD(vector1, buffer, q, float, f, 32, 4);
+
+@@ -68,6 +76,9 @@ void FNNAME (INSN_NAME) (void)
+ VDUP(vector2, , uint, u, 32, 2, 0x77);
+ VDUP(vector2, , poly, p, 8, 8, 0x55);
+ VDUP(vector2, , poly, p, 16, 4, 0x66);
++#if defined (FP16_SUPPORTED)
++ VDUP (vector2, , float, f, 16, 4, 14.6f); /* 14.6f is 0x4b4d. */
++#endif
+ VDUP(vector2, , float, f, 32, 2, 33.6f);
+
+ VDUP(vector2, q, int, s, 8, 16, 0x11);
+@@ -78,8 +89,11 @@ void FNNAME (INSN_NAME) (void)
+ VDUP(vector2, q, uint, u, 32, 4, 0x77);
+ VDUP(vector2, q, poly, p, 8, 16, 0x55);
+ VDUP(vector2, q, poly, p, 16, 8, 0x66);
++#if defined (FP16_SUPPORTED)
++ VDUP (vector2, q, float, f, 16, 8, 14.6f);
++#endif
+ VDUP(vector2, q, float, f, 32, 4, 33.8f);
+-
++
+ #define TEST_ALL_VSHUFFLE(INSN) \
+ TEST_VSHUFFLE(INSN, , int, s, 8, 8); \
+ TEST_VSHUFFLE(INSN, , int, s, 16, 4); \
+@@ -100,6 +114,10 @@ void FNNAME (INSN_NAME) (void)
+ TEST_VSHUFFLE(INSN, q, poly, p, 16, 8); \
+ TEST_VSHUFFLE(INSN, q, float, f, 32, 4)
+
++#define TEST_VSHUFFLE_FP16(INSN) \
++ TEST_VSHUFFLE(INSN, , float, f, 16, 4); \
++ TEST_VSHUFFLE(INSN, q, float, f, 16, 8);
++
+ #define TEST_ALL_EXTRA_CHUNKS() \
+ TEST_EXTRA_CHUNK(int, 8, 8, 1); \
+ TEST_EXTRA_CHUNK(int, 16, 4, 1); \
+@@ -143,17 +161,37 @@ void FNNAME (INSN_NAME) (void)
+ CHECK(test_name, poly, 8, 16, PRIx8, EXPECTED, comment); \
+ CHECK(test_name, poly, 16, 8, PRIx16, EXPECTED, comment); \
+ CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment); \
+- } \
++ }
++
++#define CHECK_RESULTS_VSHUFFLE_FP16(test_name,EXPECTED,comment) \
++ { \
++ CHECK_FP (test_name, float, 16, 4, PRIx16, EXPECTED, comment); \
++ CHECK_FP (test_name, float, 16, 8, PRIx16, EXPECTED, comment); \
++ }
+
+ clean_results ();
+
+ /* Execute the tests. */
+ TEST_ALL_VSHUFFLE(INSN_NAME);
++#if defined (FP16_SUPPORTED)
++ TEST_VSHUFFLE_FP16 (INSN_NAME);
++#endif
+
+ CHECK_RESULTS_VSHUFFLE (TEST_MSG, expected0, "(chunk 0)");
++#if defined (FP16_SUPPORTED)
++ CHECK_RESULTS_VSHUFFLE_FP16 (TEST_MSG, expected0, "(chunk 0)");
++#endif
+
+ TEST_ALL_EXTRA_CHUNKS();
++#if defined (FP16_SUPPORTED)
++ TEST_EXTRA_CHUNK (float, 16, 4, 1);
++ TEST_EXTRA_CHUNK (float, 16, 8, 1);
++#endif
+
-+#define TEST_VCREATE(T1, T2, W, N) \
-+ VECT_VAR(vcreate_vector_res, T1, W, N) = \
-+ vcreate_##T2##W(VECT_VAR(vcreate_val, T1, W, N)); \
-+ vst1_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vcreate_vector_res, T1, W, N))
+ CHECK_RESULTS_VSHUFFLE (TEST_MSG, expected1, "(chunk 1)");
++#if defined (FP16_SUPPORTED)
++ CHECK_RESULTS_VSHUFFLE_FP16 (TEST_MSG, expected1, "(chunk 1)");
++#endif
+ }
+
+ int main (void)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
+@@ -166,9 +166,11 @@ void vsli_extra(void)
+ CHECK(TEST_MSG, int, 8, 16, PRIx8, expected_max_shift, COMMENT);
+ CHECK(TEST_MSG, int, 16, 8, PRIx16, expected_max_shift, COMMENT);
+ CHECK(TEST_MSG, int, 32, 4, PRIx32, expected_max_shift, COMMENT);
++ CHECK(TEST_MSG, int, 64, 2, PRIx64, expected_max_shift, COMMENT);
+ CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_max_shift, COMMENT);
+ CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_max_shift, COMMENT);
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_max_shift, COMMENT);
++ CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected_max_shift, COMMENT);
+ CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_max_shift, COMMENT);
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_max_shift, COMMENT);
+ }
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrt_f16_1.c
+@@ -0,0 +1,72 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_hw } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+#define DECL_VAL(VAR, T1, W, N) \
-+ uint64_t VECT_VAR(VAR, T1, W, N)
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+ DECL_VAL(vcreate_val, poly, 64, 1);
-+ DECL_VARIABLE(vcreate_vector_res, poly, 64, 1);
++#define FP16_C(a) ((__fp16) a)
++#define A FP16_C (123.4)
++#define B FP16_C (567.8)
++#define C FP16_C (34.8)
++#define D FP16_C (1024)
++#define E FP16_C (663.1)
++#define F FP16_C (144.0)
++#define G FP16_C (4.8)
++#define H FP16_C (77)
++
++#define SQRT_A 0x498E /* FP16_C (__builtin_sqrtf (123.4)). */
++#define SQRT_B 0x4DF5 /* FP16_C (__builtin_sqrtf (567.8)). */
++#define SQRT_C 0x45E6 /* FP16_C (__builtin_sqrtf (34.8)). */
++#define SQRT_D 0x5000 /* FP16_C (__builtin_sqrtf (1024)). */
++#define SQRT_E 0x4E70 /* FP16_C (__builtin_sqrtf (663.1)). */
++#define SQRT_F 0x4A00 /* FP16_C (__builtin_sqrtf (144.0)). */
++#define SQRT_G 0x4062 /* FP16_C (__builtin_sqrtf (4.8)). */
++#define SQRT_H 0x4863 /* FP16_C (__builtin_sqrtf (77)). */
++
++/* Expected results for vsqrt. */
++VECT_VAR_DECL (expected_static, hfloat, 16, 4) []
++ = { SQRT_A, SQRT_B, SQRT_C, SQRT_D };
++
++VECT_VAR_DECL (expected_static, hfloat, 16, 8) []
++ = { SQRT_A, SQRT_B, SQRT_C, SQRT_D, SQRT_E, SQRT_F, SQRT_G, SQRT_H };
++
++void exec_vsqrt_f16 (void)
++{
++#undef TEST_MSG
++#define TEST_MSG "VSQRT (FP16)"
++ clean_results ();
+
-+ CLEAN(result, poly, 64, 2);
++ DECL_VARIABLE(vsrc, float, 16, 4);
++ VECT_VAR_DECL (buf_src, float, 16, 4) [] = {A, B, C, D};
++ VLOAD (vsrc, buf_src, , float, f, 16, 4);
++ DECL_VARIABLE (vector_res, float, 16, 4)
++ = vsqrt_f16 (VECT_VAR (vsrc, float, 16, 4));
++ vst1_f16 (VECT_VAR (result, float, 16, 4),
++ VECT_VAR (vector_res, float, 16, 4));
+
-+ VECT_VAR(vcreate_val, poly, 64, 1) = 0x123456789abcdef0ULL;
++ CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected_static, "");
+
-+ TEST_VCREATE(poly, p, 64, 1);
++#undef TEST_MSG
++#define TEST_MSG "VSQRTQ (FP16)"
++ clean_results ();
+
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vcreate_expected, "");
++ DECL_VARIABLE(vsrc, float, 16, 8);
++ VECT_VAR_DECL (buf_src, float, 16, 8) [] = {A, B, C, D, E, F, G, H};
++ VLOAD (vsrc, buf_src, q, float, f, 16, 8);
++ DECL_VARIABLE (vector_res, float, 16, 8)
++ = vsqrtq_f16 (VECT_VAR (vsrc, float, 16, 8));
++ vst1q_f16 (VECT_VAR (result, float, 16, 8),
++ VECT_VAR (vector_res, float, 16, 8));
+
-+ /* vdup_lane_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VDUP_LANE/VDUP_LANEQ"
++ CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_static, "");
++}
+
-+#define TEST_VDUP_LANE(Q, T1, T2, W, N, N2, L) \
-+ VECT_VAR(vdup_lane_vector_res, T1, W, N) = \
-+ vdup##Q##_lane_##T2##W(VECT_VAR(vdup_lane_vector, T1, W, N2), L); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vdup_lane_vector_res, T1, W, N))
++int
++main (void)
++{
++ exec_vsqrt_f16 ();
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c
+@@ -0,0 +1,40 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
-+ DECL_VARIABLE(vdup_lane_vector, poly, 64, 1);
-+ DECL_VARIABLE(vdup_lane_vector, poly, 64, 2);
-+ DECL_VARIABLE(vdup_lane_vector_res, poly, 64, 1);
-+ DECL_VARIABLE(vdup_lane_vector_res, poly, 64, 2);
++#include <arm_fp16.h>
+
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0x0000 /* 0.000000 */,
++ 0x8000 /* -0.000000 */,
++ 0x3da8 /* 1.414062 */,
++ 0x3f0b /* 1.760742 */,
++ 0x4479 /* 4.472656 */,
++ 0x390f /* 0.632324 */,
++ 0x7e00 /* nan */,
++ 0x3c9d /* 1.153320 */,
++ 0x7e00 /* nan */,
++ 0x3874 /* 0.556641 */,
++ 0x38a2 /* 0.579102 */,
++ 0x39a8 /* 0.707031 */,
++ 0x3c00 /* 1.000000 */,
++ 0x433f /* 3.623047 */,
++ 0x7e00 /* nan */,
++ 0x4479 /* 4.472656 */,
++ 0x7c00 /* inf */,
++ 0x7e00 /* nan */
++};
+
-+ VLOAD(vdup_lane_vector, buffer, , poly, p, 64, 1);
++#define TEST_MSG "VSQRTH_F16"
++#define INSN_NAME vsqrth_f16
+
-+ TEST_VDUP_LANE(, poly, p, 64, 1, 1, 0);
-+ TEST_VDUP_LANE(q, poly, p, 64, 2, 1, 0);
++#define EXPECTED expected
+
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vdup_lane_expected, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vdup_lane_expected, "");
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
+
-+ /* vdup_n_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VDUP/VDUPQ"
++/* Include the template for unary scalar operations. */
++#include "unary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
+
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
+
+ void
+ f_vst2_lane_f16 (float16_t * p, float16x4x2_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
+
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
+
+ void
+ f_vst2q_lane_f16 (float16_t * p, float16x8x2_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
+
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
+
+ void
+ f_vst3_lane_f16 (float16_t * p, float16x4x3_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
+
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
+
+ void
+ f_vst3q_lane_f16 (float16_t * p, float16x8x3_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
+
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
+
+ void
+ f_vst4_lane_f16 (float16_t * p, float16x4x4_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_f16_indices_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_f16_indices_1.c
+@@ -2,6 +2,7 @@
+
+ /* { dg-do compile } */
+ /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
++/* { dg-require-effective-target arm_neon_fp16_ok { target { arm*-*-* } } } */
+
+ void
+ f_vst4q_lane_f16 (float16_t * p, float16x8x4_t v)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c
+@@ -14,6 +14,7 @@ VECT_VAR_DECL(expected_st2_0,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
+ VECT_VAR_DECL(expected_st2_0,poly,8,8) [] = { 0xf0, 0xf1, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st2_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st2_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st2_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
+ VECT_VAR_DECL(expected_st2_0,int,16,8) [] = { 0xfff0, 0xfff1, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+@@ -24,6 +25,8 @@ VECT_VAR_DECL(expected_st2_0,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
+ 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st2_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st2_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st2_0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
+ 0x0, 0x0 };
+
+@@ -39,6 +42,7 @@ VECT_VAR_DECL(expected_st2_1,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st2_1,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st2_1,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st2_1,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st2_1,hfloat,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st2_1,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+@@ -48,6 +52,8 @@ VECT_VAR_DECL(expected_st2_1,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ VECT_VAR_DECL(expected_st2_1,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st2_1,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st2_1,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st2_1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+ /* Expected results for vst3, chunk 0. */
+@@ -62,6 +68,7 @@ VECT_VAR_DECL(expected_st3_0,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
+ VECT_VAR_DECL(expected_st3_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0 };
++VECT_VAR_DECL(expected_st3_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0x0 };
+ VECT_VAR_DECL(expected_st3_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
+ VECT_VAR_DECL(expected_st3_0,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+@@ -73,6 +80,8 @@ VECT_VAR_DECL(expected_st3_0,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
+ 0xfffffff2, 0x0 };
+ VECT_VAR_DECL(expected_st3_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st3_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
+ 0xc1600000, 0x0 };
+
+@@ -88,6 +97,7 @@ VECT_VAR_DECL(expected_st3_1,uint,32,2) [] = { 0xfffffff2, 0x0 };
+ VECT_VAR_DECL(expected_st3_1,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_1,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st3_1,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_1,hfloat,32,2) [] = { 0xc1600000, 0x0 };
+ VECT_VAR_DECL(expected_st3_1,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+@@ -97,6 +107,8 @@ VECT_VAR_DECL(expected_st3_1,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ VECT_VAR_DECL(expected_st3_1,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_1,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st3_1,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+ /* Expected results for vst3, chunk 2. */
+@@ -111,6 +123,7 @@ VECT_VAR_DECL(expected_st3_2,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_2,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_2,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st3_2,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_2,hfloat,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_2,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+@@ -120,6 +133,8 @@ VECT_VAR_DECL(expected_st3_2,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ VECT_VAR_DECL(expected_st3_2,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_2,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st3_2,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st3_2,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+ /* Expected results for vst4, chunk 0. */
+@@ -134,6 +149,7 @@ VECT_VAR_DECL(expected_st4_0,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
+ VECT_VAR_DECL(expected_st4_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
++VECT_VAR_DECL(expected_st4_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
+ VECT_VAR_DECL(expected_st4_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
+ VECT_VAR_DECL(expected_st4_0,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
+ 0x0, 0x0, 0x0, 0x0 };
+@@ -145,6 +161,8 @@ VECT_VAR_DECL(expected_st4_0,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
+ 0xfffffff2, 0xfffffff3 };
+ VECT_VAR_DECL(expected_st4_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
+ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st4_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80,
++ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
+ 0xc1600000, 0xc1500000 };
+
+@@ -160,6 +178,7 @@ VECT_VAR_DECL(expected_st4_1,uint,32,2) [] = { 0xfffffff2, 0xfffffff3 };
+ VECT_VAR_DECL(expected_st4_1,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_1,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st4_1,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_1,hfloat,32,2) [] = { 0xc1600000, 0xc1500000 };
+ VECT_VAR_DECL(expected_st4_1,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+@@ -169,6 +188,8 @@ VECT_VAR_DECL(expected_st4_1,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ VECT_VAR_DECL(expected_st4_1,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_1,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st4_1,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+ /* Expected results for vst4, chunk 2. */
+@@ -183,6 +204,7 @@ VECT_VAR_DECL(expected_st4_2,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_2,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_2,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st4_2,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_2,hfloat,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_2,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+@@ -192,6 +214,8 @@ VECT_VAR_DECL(expected_st4_2,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ VECT_VAR_DECL(expected_st4_2,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_2,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st4_2,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_2,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+ /* Expected results for vst4, chunk 3. */
+@@ -206,6 +230,7 @@ VECT_VAR_DECL(expected_st4_3,uint,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_3,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_3,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st4_3,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_3,hfloat,32,2) [] = { 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_3,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
+@@ -215,6 +240,8 @@ VECT_VAR_DECL(expected_st4_3,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ VECT_VAR_DECL(expected_st4_3,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_3,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+ 0x0, 0x0, 0x0, 0x0 };
++VECT_VAR_DECL(expected_st4_3,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
++ 0x0, 0x0, 0x0, 0x0 };
+ VECT_VAR_DECL(expected_st4_3,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+
+ /* Declare additional input buffers as needed. */
+@@ -229,6 +256,9 @@ VECT_VAR_DECL_INIT(buffer_vld2_lane, uint, 32, 2);
+ VECT_VAR_DECL_INIT(buffer_vld2_lane, uint, 64, 2);
+ VECT_VAR_DECL_INIT(buffer_vld2_lane, poly, 8, 2);
+ VECT_VAR_DECL_INIT(buffer_vld2_lane, poly, 16, 2);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++VECT_VAR_DECL_INIT(buffer_vld2_lane, float, 16, 2);
++#endif
+ VECT_VAR_DECL_INIT(buffer_vld2_lane, float, 32, 2);
+
+ /* Input buffers for vld3_lane. */
+@@ -242,6 +272,9 @@ VECT_VAR_DECL_INIT(buffer_vld3_lane, uint, 32, 3);
+ VECT_VAR_DECL_INIT(buffer_vld3_lane, uint, 64, 3);
+ VECT_VAR_DECL_INIT(buffer_vld3_lane, poly, 8, 3);
+ VECT_VAR_DECL_INIT(buffer_vld3_lane, poly, 16, 3);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++VECT_VAR_DECL_INIT(buffer_vld3_lane, float, 16, 3);
++#endif
+ VECT_VAR_DECL_INIT(buffer_vld3_lane, float, 32, 3);
+
+ /* Input buffers for vld4_lane. */
+@@ -255,6 +288,9 @@ VECT_VAR_DECL_INIT(buffer_vld4_lane, uint, 32, 4);
+ VECT_VAR_DECL_INIT(buffer_vld4_lane, uint, 64, 4);
+ VECT_VAR_DECL_INIT(buffer_vld4_lane, poly, 8, 4);
+ VECT_VAR_DECL_INIT(buffer_vld4_lane, poly, 16, 4);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++VECT_VAR_DECL_INIT(buffer_vld4_lane, float, 16, 4);
++#endif
+ VECT_VAR_DECL_INIT(buffer_vld4_lane, float, 32, 4);
+
+ void exec_vstX_lane (void)
+@@ -302,7 +338,7 @@ void exec_vstX_lane (void)
+
+ /* We need all variants in 64 bits, but there is no 64x2 variant,
+ nor 128 bits vectors of int8/uint8/poly8. */
+-#define DECL_ALL_VSTX_LANE(X) \
++#define DECL_ALL_VSTX_LANE_NO_FP16(X) \
+ DECL_VSTX_LANE(int, 8, 8, X); \
+ DECL_VSTX_LANE(int, 16, 4, X); \
+ DECL_VSTX_LANE(int, 32, 2, X); \
+@@ -319,11 +355,20 @@ void exec_vstX_lane (void)
+ DECL_VSTX_LANE(poly, 16, 8, X); \
+ DECL_VSTX_LANE(float, 32, 4, X)
+
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++#define DECL_ALL_VSTX_LANE(X) \
++ DECL_ALL_VSTX_LANE_NO_FP16(X); \
++ DECL_VSTX_LANE(float, 16, 4, X); \
++ DECL_VSTX_LANE(float, 16, 8, X)
++#else
++#define DECL_ALL_VSTX_LANE(X) DECL_ALL_VSTX_LANE_NO_FP16(X)
++#endif
+
-+#define TEST_VDUP(Q, T1, T2, W, N) \
-+ VECT_VAR(vdup_n_vector, T1, W, N) = \
-+ vdup##Q##_n_##T2##W(VECT_VAR(buffer_dup, T1, W, N)[i]); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vdup_n_vector, T1, W, N))
+ #define DUMMY_ARRAY(V, T, W, N, L) VECT_VAR_DECL(V,T,W,N)[N*L]
+
+ /* Use the same lanes regardless of the size of the array (X), for
+ simplicity. */
+-#define TEST_ALL_VSTX_LANE(X) \
++#define TEST_ALL_VSTX_LANE_NO_FP16(X) \
+ TEST_VSTX_LANE(, int, s, 8, 8, X, 7); \
+ TEST_VSTX_LANE(, int, s, 16, 4, X, 2); \
+ TEST_VSTX_LANE(, int, s, 32, 2, X, 0); \
+@@ -340,7 +385,16 @@ void exec_vstX_lane (void)
+ TEST_VSTX_LANE(q, poly, p, 16, 8, X, 5); \
+ TEST_VSTX_LANE(q, float, f, 32, 4, X, 2)
+
+-#define TEST_ALL_EXTRA_CHUNKS(X, Y) \
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++#define TEST_ALL_VSTX_LANE(X) \
++ TEST_ALL_VSTX_LANE_NO_FP16(X); \
++ TEST_VSTX_LANE(, float, f, 16, 4, X, 2); \
++ TEST_VSTX_LANE(q, float, f, 16, 8, X, 6)
++#else
++#define TEST_ALL_VSTX_LANE(X) TEST_ALL_VSTX_LANE_NO_FP16(X)
++#endif
+
-+ DECL_VARIABLE(vdup_n_vector, poly, 64, 1);
-+ DECL_VARIABLE(vdup_n_vector, poly, 64, 2);
++#define TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y) \
+ TEST_EXTRA_CHUNK(int, 8, 8, X, Y); \
+ TEST_EXTRA_CHUNK(int, 16, 4, X, Y); \
+ TEST_EXTRA_CHUNK(int, 32, 2, X, Y); \
+@@ -357,6 +411,15 @@ void exec_vstX_lane (void)
+ TEST_EXTRA_CHUNK(poly, 16, 8, X, Y); \
+ TEST_EXTRA_CHUNK(float, 32, 4, X, Y)
+
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++#define TEST_ALL_EXTRA_CHUNKS(X,Y) \
++ TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y); \
++ TEST_EXTRA_CHUNK(float, 16, 4, X, Y); \
++ TEST_EXTRA_CHUNK(float, 16, 8, X, Y)
++#else
++#define TEST_ALL_EXTRA_CHUNKS(X,Y) TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y)
++#endif
+
-+ /* Try to read different places from the input buffer. */
-+ for (i=0; i< 3; i++) {
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
+ /* Declare the temporary buffers / variables. */
+ DECL_ALL_VSTX_LANE(2);
+ DECL_ALL_VSTX_LANE(3);
+@@ -371,12 +434,18 @@ void exec_vstX_lane (void)
+ DUMMY_ARRAY(buffer_src, uint, 32, 2, 4);
+ DUMMY_ARRAY(buffer_src, poly, 8, 8, 4);
+ DUMMY_ARRAY(buffer_src, poly, 16, 4, 4);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ DUMMY_ARRAY(buffer_src, float, 16, 4, 4);
++#endif
+ DUMMY_ARRAY(buffer_src, float, 32, 2, 4);
+ DUMMY_ARRAY(buffer_src, int, 16, 8, 4);
+ DUMMY_ARRAY(buffer_src, int, 32, 4, 4);
+ DUMMY_ARRAY(buffer_src, uint, 16, 8, 4);
+ DUMMY_ARRAY(buffer_src, uint, 32, 4, 4);
+ DUMMY_ARRAY(buffer_src, poly, 16, 8, 4);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ DUMMY_ARRAY(buffer_src, float, 16, 8, 4);
++#endif
+ DUMMY_ARRAY(buffer_src, float, 32, 4, 4);
+
+ /* Check vst2_lane/vst2q_lane. */
+@@ -400,6 +469,10 @@ void exec_vstX_lane (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st2_0, CMT);
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st2_0, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st2_0, CMT);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st2_0, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st2_0, CMT);
++#endif
+
+ TEST_ALL_EXTRA_CHUNKS(2, 1);
+ #undef CMT
+@@ -419,6 +492,10 @@ void exec_vstX_lane (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st2_1, CMT);
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st2_1, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st2_1, CMT);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st2_1, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st2_1, CMT);
++#endif
+
+
+ /* Check vst3_lane/vst3q_lane. */
+@@ -444,6 +521,10 @@ void exec_vstX_lane (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st3_0, CMT);
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st3_0, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st3_0, CMT);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st3_0, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st3_0, CMT);
++#endif
+
+ TEST_ALL_EXTRA_CHUNKS(3, 1);
+
+@@ -464,6 +545,10 @@ void exec_vstX_lane (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st3_1, CMT);
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st3_1, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st3_1, CMT);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st3_1, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st3_1, CMT);
++#endif
+
+ TEST_ALL_EXTRA_CHUNKS(3, 2);
+
+@@ -484,6 +569,10 @@ void exec_vstX_lane (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st3_2, CMT);
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st3_2, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st3_2, CMT);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st3_2, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st3_2, CMT);
++#endif
+
+
+ /* Check vst4_lane/vst4q_lane. */
+@@ -509,6 +598,10 @@ void exec_vstX_lane (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st4_0, CMT);
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st4_0, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st4_0, CMT);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st4_0, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st4_0, CMT);
++#endif
+
+ TEST_ALL_EXTRA_CHUNKS(4, 1);
+
+@@ -529,6 +622,10 @@ void exec_vstX_lane (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st4_1, CMT);
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st4_1, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st4_1, CMT);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st4_1, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st4_1, CMT);
++#endif
+
+ TEST_ALL_EXTRA_CHUNKS(4, 2);
+
+@@ -549,6 +646,10 @@ void exec_vstX_lane (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st4_2, CMT);
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st4_2, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st4_2, CMT);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st4_2, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st4_2, CMT);
++#endif
+
+ TEST_ALL_EXTRA_CHUNKS(4, 3);
+
+@@ -569,6 +670,10 @@ void exec_vstX_lane (void)
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st4_3, CMT);
+ CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st4_3, CMT);
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st4_3, CMT);
++#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st4_3, CMT);
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st4_3, CMT);
++#endif
+ }
+
+ int main (void)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsub.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsub.c
+@@ -44,6 +44,14 @@ VECT_VAR_DECL(expected,uint,64,2) [] = { 0xffffffffffffffed,
+ VECT_VAR_DECL(expected_float32,hfloat,32,2) [] = { 0xc00ccccd, 0xc00ccccd };
+ VECT_VAR_DECL(expected_float32,hfloat,32,4) [] = { 0xc00ccccc, 0xc00ccccc,
+ 0xc00ccccc, 0xc00ccccc };
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++VECT_VAR_DECL(expected_float16, hfloat, 16, 4) [] = { 0xc066, 0xc066,
++ 0xc066, 0xc066 };
++VECT_VAR_DECL(expected_float16, hfloat, 16, 8) [] = { 0xc067, 0xc067,
++ 0xc067, 0xc067,
++ 0xc067, 0xc067,
++ 0xc067, 0xc067 };
++#endif
+
+ void exec_vsub_f32(void)
+ {
+@@ -67,4 +75,27 @@ void exec_vsub_f32(void)
+
+ CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected_float32, "");
+ CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_float32, "");
+
-+ TEST_VDUP(, poly, p, 64, 1);
-+ TEST_VDUP(q, poly, p, 64, 2);
++#if defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ DECL_VARIABLE(vector, float, 16, 4);
++ DECL_VARIABLE(vector, float, 16, 8);
+
-+ switch (i) {
-+ case 0:
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vdup_n_expected0, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vdup_n_expected0, "");
-+ break;
-+ case 1:
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vdup_n_expected1, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vdup_n_expected1, "");
-+ break;
-+ case 2:
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vdup_n_expected2, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vdup_n_expected2, "");
-+ break;
-+ default:
-+ abort();
-+ }
-+ }
++ DECL_VARIABLE(vector2, float, 16, 4);
++ DECL_VARIABLE(vector2, float, 16, 8);
+
-+ /* vexit_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VEXT/VEXTQ"
++ DECL_VARIABLE(vector_res, float, 16, 4);
++ DECL_VARIABLE(vector_res, float, 16, 8);
+
-+#define TEST_VEXT(Q, T1, T2, W, N, V) \
-+ VECT_VAR(vext_vector_res, T1, W, N) = \
-+ vext##Q##_##T2##W(VECT_VAR(vext_vector1, T1, W, N), \
-+ VECT_VAR(vext_vector2, T1, W, N), \
-+ V); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vext_vector_res, T1, W, N))
++ VDUP(vector, , float, f, 16, 4, 2.3f);
++ VDUP(vector, q, float, f, 16, 8, 3.4f);
+
-+ DECL_VARIABLE(vext_vector1, poly, 64, 1);
-+ DECL_VARIABLE(vext_vector1, poly, 64, 2);
-+ DECL_VARIABLE(vext_vector2, poly, 64, 1);
-+ DECL_VARIABLE(vext_vector2, poly, 64, 2);
-+ DECL_VARIABLE(vext_vector_res, poly, 64, 1);
-+ DECL_VARIABLE(vext_vector_res, poly, 64, 2);
++ VDUP(vector2, , float, f, 16, 4, 4.5f);
++ VDUP(vector2, q, float, f, 16, 8, 5.6f);
+
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++ TEST_BINARY_OP(INSN_NAME, , float, f, 16, 4);
++ TEST_BINARY_OP(INSN_NAME, q, float, f, 16, 8);
+
-+ VLOAD(vext_vector1, buffer, , poly, p, 64, 1);
-+ VLOAD(vext_vector1, buffer, q, poly, p, 64, 2);
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_float16, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_float16, "");
++#endif
+ }
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c
+@@ -0,0 +1,42 @@
++/* { dg-do run } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_hw } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
-+ VDUP(vext_vector2, , poly, p, 64, 1, 0x88);
-+ VDUP(vext_vector2, q, poly, p, 64, 2, 0x88);
++#include <arm_fp16.h>
+
-+ TEST_VEXT(, poly, p, 64, 1, 0);
-+ TEST_VEXT(q, poly, p, 64, 2, 1);
++#define INFF __builtin_inf ()
+
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vext_expected, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vext_expected, "");
++/* Expected results (16-bit hexadecimal representation). */
++uint16_t expected[] =
++{
++ 0xbc00 /* -1.000000 */,
++ 0xbc00 /* -1.000000 */,
++ 0x4654 /* 6.328125 */,
++ 0xd60e /* -96.875000 */,
++ 0xc900 /* -10.000000 */,
++ 0x36b8 /* 0.419922 */,
++ 0xc19a /* -2.800781 */,
++ 0x4848 /* 8.562500 */,
++ 0xbd34 /* -1.300781 */,
++ 0xccec /* -19.687500 */,
++ 0x4791 /* 7.566406 */,
++ 0xbf34 /* -1.800781 */,
++ 0x484d /* 8.601562 */,
++ 0x4804 /* 8.031250 */,
++ 0xc69c /* -6.609375 */,
++ 0x4ceb /* 19.671875 */,
++ 0x7c00 /* inf */,
++ 0xfc00 /* -inf */
++};
+
-+ /* vget_low_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VGET_LOW"
++#define TEST_MSG "VSUB_F16"
++#define INSN_NAME vsubh_f16
++
++#define EXPECTED expected
++
++#define INPUT_TYPE float16_t
++#define OUTPUT_TYPE float16_t
++#define OUTPUT_TYPE_SIZE 16
++
++/* Include the template for binary scalar operations. */
++#include "binary_scalar_op.inc"
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn.c
+@@ -15,6 +15,10 @@ VECT_VAR_DECL(expected0,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
+ VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf1, 0x55, 0x55,
+ 0xf2, 0xf3, 0x55, 0x55 };
+ VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff1, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected0, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
+ VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf1, 0x11, 0x11,
+ 0xf2, 0xf3, 0x11, 0x11,
+@@ -36,6 +40,12 @@ VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf1, 0x55, 0x55,
+ 0xf6, 0xf7, 0x55, 0x55 };
+ VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff1, 0x66, 0x66,
+ 0xfff2, 0xfff3, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected0, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
++ 0x4b4d, 0x4b4d,
++ 0xcb00, 0xca80,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
+ 0x42073333, 0x42073333 };
+
+@@ -51,6 +61,10 @@ VECT_VAR_DECL(expected1,uint,32,2) [] = { 0x77, 0x77 };
+ VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf4, 0xf5, 0x55, 0x55,
+ 0xf6, 0xf7, 0x55, 0x55 };
+ VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff2, 0xfff3, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected1, hfloat, 16, 4) [] = { 0xcb00, 0xca80,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0x42066666, 0x42066666 };
+ VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf8, 0xf9, 0x11, 0x11,
+ 0xfa, 0xfb, 0x11, 0x11,
+@@ -72,6 +86,12 @@ VECT_VAR_DECL(expected1,poly,8,16) [] = { 0xf8, 0xf9, 0x55, 0x55,
+ 0xfe, 0xff, 0x55, 0x55 };
+ VECT_VAR_DECL(expected1,poly,16,8) [] = { 0xfff4, 0xfff5, 0x66, 0x66,
+ 0xfff6, 0xfff7, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected1, hfloat, 16, 8) [] = { 0xca00, 0xc980,
++ 0x4b4d, 0x4b4d,
++ 0xc900, 0xc880,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected1,hfloat,32,4) [] = { 0xc1600000, 0xc1500000,
+ 0x42073333, 0x42073333 };
+
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c
+@@ -0,0 +1,263 @@
++/* { dg-do run } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+#define TEST_VGET_LOW(T1, T2, W, N, N2) \
-+ VECT_VAR(vget_low_vector64, T1, W, N) = \
-+ vget_low_##T2##W(VECT_VAR(vget_low_vector128, T1, W, N2)); \
-+ vst1_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vget_low_vector64, T1, W, N))
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+ DECL_VARIABLE(vget_low_vector64, poly, 64, 1);
-+ DECL_VARIABLE(vget_low_vector128, poly, 64, 2);
++/* Expected results. */
++VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0x11, 0xf2, 0x11,
++ 0xf4, 0x11, 0xf6, 0x11 };
++VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0x22, 0xfff2, 0x22 };
++VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0x33 };
++VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0x55, 0xf2, 0x55,
++ 0xf4, 0x55, 0xf6, 0x55 };
++VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0x66, 0xfff2, 0x66 };
++VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0x77 };
++VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0x55, 0xf2, 0x55,
++ 0xf4, 0x55, 0xf6, 0x55 };
++VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0x66, 0xfff2, 0x66 };
++VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0x42066666 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0x4b4d,
++ 0xcb00, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0x11, 0xf2, 0x11,
++ 0xf4, 0x11, 0xf6, 0x11,
++ 0xf8, 0x11, 0xfa, 0x11,
++ 0xfc, 0x11, 0xfe, 0x11 };
++VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0x22, 0xfff2, 0x22,
++ 0xfff4, 0x22, 0xfff6, 0x22 };
++VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0x33,
++ 0xfffffff2, 0x33 };
++VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffffffffffffff0,
++ 0x44 };
++VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0x55, 0xf2, 0x55,
++ 0xf4, 0x55, 0xf6, 0x55,
++ 0xf8, 0x55, 0xfa, 0x55,
++ 0xfc, 0x55, 0xfe, 0x55 };
++VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0x66, 0xfff2, 0x66,
++ 0xfff4, 0x66, 0xfff6, 0x66 };
++VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0x77,
++ 0xfffffff2, 0x77 };
++VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff0,
++ 0x88 };
++VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0x55, 0xf2, 0x55,
++ 0xf4, 0x55, 0xf6, 0x55,
++ 0xf8, 0x55, 0xfa, 0x55,
++ 0xfc, 0x55, 0xfe, 0x55 };
++VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0x66, 0xfff2, 0x66,
++ 0xfff4, 0x66, 0xfff6, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0x4b4d,
++ 0xcb00, 0x4b4d,
++ 0xca00, 0x4b4d,
++ 0xc900, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0x42073333,
++ 0xc1600000, 0x42073333 };
+
-+ CLEAN(result, poly, 64, 1);
++#define TEST_MSG "VTRN1"
++void exec_vtrn_half (void)
++{
++#define TEST_VTRN(PART, Q, T1, T2, W, N) \
++ VECT_VAR(vector_res, T1, W, N) = \
++ vtrn##PART##Q##_##T2##W(VECT_VAR(vector, T1, W, N), \
++ VECT_VAR(vector2, T1, W, N)); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
-+ VLOAD(vget_low_vector128, buffer, q, poly, p, 64, 2);
++#define TEST_VTRN1(Q, T1, T2, W, N) TEST_VTRN(1, Q, T1, T2, W, N)
+
-+ TEST_VGET_LOW(poly, p, 64, 1, 2);
++ /* Input vector can only have 64 bits. */
++ DECL_VARIABLE_ALL_VARIANTS(vector);
++ DECL_VARIABLE_ALL_VARIANTS(vector2);
++ DECL_VARIABLE(vector, float, 64, 2);
++ DECL_VARIABLE(vector2, float, 64, 2);
+
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vget_low_expected, "");
++ DECL_VARIABLE_ALL_VARIANTS(vector_res);
++ DECL_VARIABLE(vector_res, float, 64, 2);
+
-+ /* vld1_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VLD1/VLD1Q"
++ clean_results ();
++ /* We don't have vtrn1_T64x1, so set expected to the clean value. */
++ CLEAN(expected, int, 64, 1);
++ CLEAN(expected, uint, 64, 1);
+
-+#define TEST_VLD1(VAR, BUF, Q, T1, T2, W, N) \
-+ VECT_VAR(VAR, T1, W, N) = vld1##Q##_##T2##W(VECT_VAR(BUF, T1, W, N)); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(VAR, T1, W, N))
++ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (FP16_SUPPORTED)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
++ VLOAD(vector, buffer, , float, f, 32, 2);
++ VLOAD(vector, buffer, q, float, f, 32, 4);
++ VLOAD(vector, buffer, q, float, f, 64, 2);
++
++ /* Choose arbitrary initialization values. */
++ VDUP(vector2, , int, s, 8, 8, 0x11);
++ VDUP(vector2, , int, s, 16, 4, 0x22);
++ VDUP(vector2, , int, s, 32, 2, 0x33);
++ VDUP(vector2, , uint, u, 8, 8, 0x55);
++ VDUP(vector2, , uint, u, 16, 4, 0x66);
++ VDUP(vector2, , uint, u, 32, 2, 0x77);
++ VDUP(vector2, , poly, p, 8, 8, 0x55);
++ VDUP(vector2, , poly, p, 16, 4, 0x66);
++#if defined (FP16_SUPPORTED)
++ VDUP (vector2, , float, f, 16, 4, 14.6f); /* 14.6f is 0x4b4d. */
++#endif
++ VDUP(vector2, , float, f, 32, 2, 33.6f);
++
++ VDUP(vector2, q, int, s, 8, 16, 0x11);
++ VDUP(vector2, q, int, s, 16, 8, 0x22);
++ VDUP(vector2, q, int, s, 32, 4, 0x33);
++ VDUP(vector2, q, int, s, 64, 2, 0x44);
++ VDUP(vector2, q, uint, u, 8, 16, 0x55);
++ VDUP(vector2, q, uint, u, 16, 8, 0x66);
++ VDUP(vector2, q, uint, u, 32, 4, 0x77);
++ VDUP(vector2, q, uint, u, 64, 2, 0x88);
++ VDUP(vector2, q, poly, p, 8, 16, 0x55);
++ VDUP(vector2, q, poly, p, 16, 8, 0x66);
++#if defined (FP16_SUPPORTED)
++ VDUP (vector2, q, float, f, 16, 8, 14.6f);
++#endif
++ VDUP(vector2, q, float, f, 32, 4, 33.8f);
++ VDUP(vector2, q, float, f, 64, 2, 33.8f);
++
++ TEST_VTRN1(, int, s, 8, 8);
++ TEST_VTRN1(, int, s, 16, 4);
++ TEST_VTRN1(, int, s, 32, 2);
++ TEST_VTRN1(, uint, u, 8, 8);
++ TEST_VTRN1(, uint, u, 16, 4);
++ TEST_VTRN1(, uint, u, 32, 2);
++ TEST_VTRN1(, poly, p, 8, 8);
++ TEST_VTRN1(, poly, p, 16, 4);
++#if defined (FP16_SUPPORTED)
++ TEST_VTRN1(, float, f, 16, 4);
++#endif
++ TEST_VTRN1(, float, f, 32, 2);
++
++ TEST_VTRN1(q, int, s, 8, 16);
++ TEST_VTRN1(q, int, s, 16, 8);
++ TEST_VTRN1(q, int, s, 32, 4);
++ TEST_VTRN1(q, int, s, 64, 2);
++ TEST_VTRN1(q, uint, u, 8, 16);
++ TEST_VTRN1(q, uint, u, 16, 8);
++ TEST_VTRN1(q, uint, u, 32, 4);
++ TEST_VTRN1(q, uint, u, 64, 2);
++ TEST_VTRN1(q, poly, p, 8, 16);
++ TEST_VTRN1(q, poly, p, 16, 8);
++#if defined (FP16_SUPPORTED)
++ TEST_VTRN1(q, float, f, 16, 8);
++#endif
++ TEST_VTRN1(q, float, f, 32, 4);
++ TEST_VTRN1(q, float, f, 64, 2);
+
-+ DECL_VARIABLE(vld1_vector, poly, 64, 1);
-+ DECL_VARIABLE(vld1_vector, poly, 64, 2);
++#if defined (FP16_SUPPORTED)
++ CHECK_RESULTS (TEST_MSG, "");
++#else
++ CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
++#endif
+
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++#undef TEST_MSG
++#define TEST_MSG "VTRN2"
+
-+ VLOAD(vld1_vector, buffer, , poly, p, 64, 1);
-+ VLOAD(vld1_vector, buffer, q, poly, p, 64, 2);
++#define TEST_VTRN2(Q, T1, T2, W, N) TEST_VTRN(2, Q, T1, T2, W, N)
+
-+ TEST_VLD1(vld1_vector, buffer, , poly, p, 64, 1);
-+ TEST_VLD1(vld1_vector, buffer, q, poly, p, 64, 2);
++/* Expected results. */
++VECT_VAR_DECL(expected2,int,8,8) [] = { 0xf1, 0x11, 0xf3, 0x11,
++ 0xf5, 0x11, 0xf7, 0x11 };
++VECT_VAR_DECL(expected2,int,16,4) [] = { 0xfff1, 0x22, 0xfff3, 0x22 };
++VECT_VAR_DECL(expected2,int,32,2) [] = { 0xfffffff1, 0x33 };
++VECT_VAR_DECL(expected2,int,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(expected2,uint,8,8) [] = { 0xf1, 0x55, 0xf3, 0x55,
++ 0xf5, 0x55, 0xf7, 0x55 };
++VECT_VAR_DECL(expected2,uint,16,4) [] = { 0xfff1, 0x66, 0xfff3, 0x66 };
++VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xfffffff1, 0x77 };
++VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf1, 0x55, 0xf3, 0x55,
++ 0xf5, 0x55, 0xf7, 0x55 };
++VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff1, 0x66, 0xfff3, 0x66 };
++VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1700000, 0x42066666 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected2, hfloat, 16, 4) [] = { 0xcb80, 0x4b4d,
++ 0xca80, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected2,int,8,16) [] = { 0xf1, 0x11, 0xf3, 0x11,
++ 0xf5, 0x11, 0xf7, 0x11,
++ 0xf9, 0x11, 0xfb, 0x11,
++ 0xfd, 0x11, 0xff, 0x11 };
++VECT_VAR_DECL(expected2,int,16,8) [] = { 0xfff1, 0x22, 0xfff3, 0x22,
++ 0xfff5, 0x22, 0xfff7, 0x22 };
++VECT_VAR_DECL(expected2,int,32,4) [] = { 0xfffffff1, 0x33,
++ 0xfffffff3, 0x33 };
++VECT_VAR_DECL(expected2,int,64,2) [] = { 0xfffffffffffffff1,
++ 0x44 };
++VECT_VAR_DECL(expected2,uint,8,16) [] = { 0xf1, 0x55, 0xf3, 0x55,
++ 0xf5, 0x55, 0xf7, 0x55,
++ 0xf9, 0x55, 0xfb, 0x55,
++ 0xfd, 0x55, 0xff, 0x55 };
++VECT_VAR_DECL(expected2,uint,16,8) [] = { 0xfff1, 0x66, 0xfff3, 0x66,
++ 0xfff5, 0x66, 0xfff7, 0x66 };
++VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xfffffff1, 0x77,
++ 0xfffffff3, 0x77 };
++VECT_VAR_DECL(expected2,uint,64,2) [] = { 0xfffffffffffffff1,
++ 0x88 };
++VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf1, 0x55, 0xf3, 0x55,
++ 0xf5, 0x55, 0xf7, 0x55,
++ 0xf9, 0x55, 0xfb, 0x55,
++ 0xfd, 0x55, 0xff, 0x55 };
++VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff1, 0x66, 0xfff3, 0x66,
++ 0xfff5, 0x66, 0xfff7, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected2, hfloat, 16, 8) [] = { 0xcb80, 0x4b4d,
++ 0xca80, 0x4b4d,
++ 0xc980, 0x4b4d,
++ 0xc880, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1700000, 0x42073333,
++ 0xc1500000, 0x42073333 };
++ clean_results ();
++ CLEAN(expected2, int, 64, 1);
++ CLEAN(expected2, uint, 64, 1);
++
++ TEST_VTRN2(, int, s, 8, 8);
++ TEST_VTRN2(, int, s, 16, 4);
++ TEST_VTRN2(, int, s, 32, 2);
++ TEST_VTRN2(, uint, u, 8, 8);
++ TEST_VTRN2(, uint, u, 16, 4);
++ TEST_VTRN2(, uint, u, 32, 2);
++ TEST_VTRN2(, poly, p, 8, 8);
++ TEST_VTRN2(, poly, p, 16, 4);
++#if defined (FP16_SUPPORTED)
++ TEST_VTRN2(, float, f, 16, 4);
++#endif
++ TEST_VTRN2(, float, f, 32, 2);
++
++ TEST_VTRN2(q, int, s, 8, 16);
++ TEST_VTRN2(q, int, s, 16, 8);
++ TEST_VTRN2(q, int, s, 32, 4);
++ TEST_VTRN2(q, int, s, 64, 2);
++ TEST_VTRN2(q, uint, u, 8, 16);
++ TEST_VTRN2(q, uint, u, 16, 8);
++ TEST_VTRN2(q, uint, u, 32, 4);
++ TEST_VTRN2(q, uint, u, 64, 2);
++ TEST_VTRN2(q, poly, p, 8, 16);
++ TEST_VTRN2(q, poly, p, 16, 8);
++#if defined (FP16_SUPPORTED)
++ TEST_VTRN2(q, float, f, 16, 8);
++#endif
++ TEST_VTRN2(q, float, f, 32, 4);
++ TEST_VTRN2(q, float, f, 64, 2);
+
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_expected, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_expected, "");
++ CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
++#if defined (FP16_SUPPORTED)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected2, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected2, "");
++#endif
++}
+
-+ /* vld1_dup_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VLD1_DUP/VLD1_DUPQ"
++int main (void)
++{
++ exec_vtrn_half ();
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
+@@ -32,10 +32,21 @@ VECT_VAR_DECL(expected_unsigned,uint,16,8) [] = { 0x0, 0xffff,
+ VECT_VAR_DECL(expected_unsigned,uint,32,4) [] = { 0x0, 0xffffffff,
+ 0x0, 0xffffffff };
+
+-#ifndef INSN_NAME
++/* Expected results with poly input. */
++VECT_VAR_DECL(expected_poly,uint,8,8) [] = { 0x0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(expected_poly,uint,8,16) [] = { 0x0, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff,
++ 0xff, 0xff, 0xff, 0xff };
++VECT_VAR_DECL(expected_poly,uint,16,4) [] = { 0x0, 0xffff, 0x0, 0xffff };
++VECT_VAR_DECL(expected_poly,uint,16,8) [] = { 0x0, 0xffff,
++ 0x0, 0xffff,
++ 0xffff, 0xffff,
++ 0xffff, 0xffff };
+
-+#define TEST_VLD1_DUP(VAR, BUF, Q, T1, T2, W, N) \
-+ VECT_VAR(VAR, T1, W, N) = \
-+ vld1##Q##_dup_##T2##W(&VECT_VAR(BUF, T1, W, N)[i]); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(VAR, T1, W, N))
+ #define INSN_NAME vtst
+ #define TEST_MSG "VTST/VTSTQ"
+-#endif
+
+ /* We can't use the standard ref_v_binary_op.c template because vtst
+ has no 64 bits variant, and outputs are always of uint type. */
+@@ -73,12 +84,16 @@ FNNAME (INSN_NAME)
+ VDUP(vector2, , uint, u, 8, 8, 15);
+ VDUP(vector2, , uint, u, 16, 4, 5);
+ VDUP(vector2, , uint, u, 32, 2, 1);
++ VDUP(vector2, , poly, p, 8, 8, 15);
++ VDUP(vector2, , poly, p, 16, 4, 5);
+ VDUP(vector2, q, int, s, 8, 16, 15);
+ VDUP(vector2, q, int, s, 16, 8, 5);
+ VDUP(vector2, q, int, s, 32, 4, 1);
+ VDUP(vector2, q, uint, u, 8, 16, 15);
+ VDUP(vector2, q, uint, u, 16, 8, 5);
+ VDUP(vector2, q, uint, u, 32, 4, 1);
++ VDUP(vector2, q, poly, p, 8, 16, 15);
++ VDUP(vector2, q, poly, p, 16, 8, 5);
+
+ #define TEST_MACRO_NO64BIT_VARIANT_1_5(MACRO, VAR, T1, T2) \
+ MACRO(VAR, , T1, T2, 8, 8); \
+@@ -111,6 +126,18 @@ FNNAME (INSN_NAME)
+ CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_unsigned, CMT);
+ CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_unsigned, CMT);
+ CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_unsigned, CMT);
+
-+ DECL_VARIABLE(vld1_dup_vector, poly, 64, 1);
-+ DECL_VARIABLE(vld1_dup_vector, poly, 64, 2);
++ /* Now, test the variants with poly8 and poly16 as input. */
++#undef CMT
++#define CMT " (poly input)"
++ TEST_BINARY_OP(INSN_NAME, , poly, p, 8, 8);
++ TEST_BINARY_OP(INSN_NAME, , poly, p, 16, 4);
++ TEST_BINARY_OP(INSN_NAME, q, poly, p, 8, 16);
++ TEST_BINARY_OP(INSN_NAME, q, poly, p, 16, 8);
++ CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_poly, CMT);
++ CHECK(TEST_MSG, uint, 16, 4, PRIx16, expected_poly, CMT);
++ CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_poly, CMT);
++ CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_poly, CMT);
+ }
+
+ int main (void)
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp.c
+@@ -19,6 +19,10 @@ VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf4, 0xf5, 0xf6, 0xf7 };
+ VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff1,
+ 0xfff2, 0xfff3 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected0, hfloat, 16, 4) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80 };
++#endif
+ VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
+ VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+ 0xf4, 0xf5, 0xf6, 0xf7,
+@@ -48,6 +52,12 @@ VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff1,
+ 0xfff2, 0xfff3,
+ 0xfff4, 0xfff5,
+ 0xfff6, 0xfff7 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected0, hfloat, 16, 8) [] = { 0xcc00, 0xcb80,
++ 0xcb00, 0xca80,
++ 0xca00, 0xc980,
++ 0xc900, 0xc880 };
++#endif
+ VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
+ 0xc1600000, 0xc1500000 };
+
+@@ -63,6 +73,10 @@ VECT_VAR_DECL(expected1,uint,32,2) [] = { 0x77, 0x77 };
+ VECT_VAR_DECL(expected1,poly,8,8) [] = { 0x55, 0x55, 0x55, 0x55,
+ 0x55, 0x55, 0x55, 0x55 };
+ VECT_VAR_DECL(expected1,poly,16,4) [] = { 0x66, 0x66, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected1, hfloat, 16, 4) [] = { 0x4b4d, 0x4b4d,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0x42066666, 0x42066666 };
+ VECT_VAR_DECL(expected1,int,8,16) [] = { 0x11, 0x11, 0x11, 0x11,
+ 0x11, 0x11, 0x11, 0x11,
+@@ -84,6 +98,12 @@ VECT_VAR_DECL(expected1,poly,8,16) [] = { 0x55, 0x55, 0x55, 0x55,
+ 0x55, 0x55, 0x55, 0x55 };
+ VECT_VAR_DECL(expected1,poly,16,8) [] = { 0x66, 0x66, 0x66, 0x66,
+ 0x66, 0x66, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected1, hfloat, 16, 8) [] = { 0x4b4d, 0x4b4d,
++ 0x4b4d, 0x4b4d,
++ 0x4b4d, 0x4b4d,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected1,hfloat,32,4) [] = { 0x42073333, 0x42073333,
+ 0x42073333, 0x42073333 };
+
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c
+@@ -0,0 +1,259 @@
++/* { dg-do run } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+ /* Try to read different places from the input buffer. */
-+ for (i=0; i<3; i++) {
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
+
-+ TEST_VLD1_DUP(vld1_dup_vector, buffer_dup, , poly, p, 64, 1);
-+ TEST_VLD1_DUP(vld1_dup_vector, buffer_dup, q, poly, p, 64, 2);
++/* Expected results. */
++VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
++ 0x11, 0x11, 0x11, 0x11 };
++VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0xfff2, 0x22, 0x22 };
++VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0x33 };
++VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
++ 0x55, 0x55, 0x55, 0x55 };
++VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff2, 0x66, 0x66 };
++VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0x77 };
++VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
++ 0x55, 0x55, 0x55, 0x55 };
++VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff2, 0x66, 0x66 };
++VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0x42066666 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0xcb00,
++ 0x4b4d, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
++ 0xf8, 0xfa, 0xfc, 0xfe,
++ 0x11, 0x11, 0x11, 0x11,
++ 0x11, 0x11, 0x11, 0x11 };
++VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0xfff2, 0xfff4, 0xfff6,
++ 0x22, 0x22, 0x22, 0x22 };
++VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0xfffffff2,
++ 0x33, 0x33 };
++VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffffffffffffff0,
++ 0x44 };
++VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
++ 0xf8, 0xfa, 0xfc, 0xfe,
++ 0x55, 0x55, 0x55, 0x55,
++ 0x55, 0x55, 0x55, 0x55 };
++VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0xfff2, 0xfff4, 0xfff6,
++ 0x66, 0x66, 0x66, 0x66 };
++VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0xfffffff2, 0x77, 0x77 };
++VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff0,
++ 0x88 };
++VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0xf2, 0xf4, 0xf6,
++ 0xf8, 0xfa, 0xfc, 0xfe,
++ 0x55, 0x55, 0x55, 0x55,
++ 0x55, 0x55, 0x55, 0x55 };
++VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff2, 0xfff4, 0xfff6,
++ 0x66, 0x66, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0xcb00, 0xca00, 0xc900,
++ 0x4b4d, 0x4b4d, 0x4b4d, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1600000,
++ 0x42073333, 0x42073333 };
+
-+ switch (i) {
-+ case 0:
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_dup_expected0, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_dup_expected0, "");
-+ break;
-+ case 1:
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_dup_expected1, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_dup_expected1, "");
-+ break;
-+ case 2:
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_dup_expected2, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_dup_expected2, "");
-+ break;
-+ default:
-+ abort();
-+ }
-+ }
++#define TEST_MSG "VUZP1"
++void exec_vuzp_half (void)
++{
++#define TEST_VUZP(PART, Q, T1, T2, W, N) \
++ VECT_VAR(vector_res, T1, W, N) = \
++ vuzp##PART##Q##_##T2##W(VECT_VAR(vector, T1, W, N), \
++ VECT_VAR(vector2, T1, W, N)); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
-+ /* vld1_lane_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VLD1_LANE/VLD1_LANEQ"
++#define TEST_VUZP1(Q, T1, T2, W, N) TEST_VUZP(1, Q, T1, T2, W, N)
+
-+#define TEST_VLD1_LANE(Q, T1, T2, W, N, L) \
-+ memset (VECT_VAR(vld1_lane_buffer_src, T1, W, N), 0xAA, W/8*N); \
-+ VECT_VAR(vld1_lane_vector_src, T1, W, N) = \
-+ vld1##Q##_##T2##W(VECT_VAR(vld1_lane_buffer_src, T1, W, N)); \
-+ VECT_VAR(vld1_lane_vector, T1, W, N) = \
-+ vld1##Q##_lane_##T2##W(VECT_VAR(buffer, T1, W, N), \
-+ VECT_VAR(vld1_lane_vector_src, T1, W, N), L); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vld1_lane_vector, T1, W, N))
++ /* Input vector can only have 64 bits. */
++ DECL_VARIABLE_ALL_VARIANTS(vector);
++ DECL_VARIABLE_ALL_VARIANTS(vector2);
++ DECL_VARIABLE(vector, float, 64, 2);
++ DECL_VARIABLE(vector2, float, 64, 2);
+
-+ DECL_VARIABLE(vld1_lane_vector, poly, 64, 1);
-+ DECL_VARIABLE(vld1_lane_vector, poly, 64, 2);
-+ DECL_VARIABLE(vld1_lane_vector_src, poly, 64, 1);
-+ DECL_VARIABLE(vld1_lane_vector_src, poly, 64, 2);
++ DECL_VARIABLE_ALL_VARIANTS(vector_res);
++ DECL_VARIABLE(vector_res, float, 64, 2);
+
-+ ARRAY(vld1_lane_buffer_src, poly, 64, 1);
-+ ARRAY(vld1_lane_buffer_src, poly, 64, 2);
++ clean_results ();
++ /* We don't have vuzp1_T64x1, so set expected to the clean value. */
++ CLEAN(expected, int, 64, 1);
++ CLEAN(expected, uint, 64, 1);
+
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (FP16_SUPPORTED)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
++ VLOAD(vector, buffer, , float, f, 32, 2);
++ VLOAD(vector, buffer, q, float, f, 32, 4);
++ VLOAD(vector, buffer, q, float, f, 64, 2);
++
++ /* Choose arbitrary initialization values. */
++ VDUP(vector2, , int, s, 8, 8, 0x11);
++ VDUP(vector2, , int, s, 16, 4, 0x22);
++ VDUP(vector2, , int, s, 32, 2, 0x33);
++ VDUP(vector2, , uint, u, 8, 8, 0x55);
++ VDUP(vector2, , uint, u, 16, 4, 0x66);
++ VDUP(vector2, , uint, u, 32, 2, 0x77);
++ VDUP(vector2, , poly, p, 8, 8, 0x55);
++ VDUP(vector2, , poly, p, 16, 4, 0x66);
++#if defined (FP16_SUPPORTED)
++ VDUP (vector2, , float, f, 16, 4, 14.6f); /* 14.6f is 0x4b4d. */
++#endif
++ VDUP(vector2, , float, f, 32, 2, 33.6f);
++
++ VDUP(vector2, q, int, s, 8, 16, 0x11);
++ VDUP(vector2, q, int, s, 16, 8, 0x22);
++ VDUP(vector2, q, int, s, 32, 4, 0x33);
++ VDUP(vector2, q, int, s, 64, 2, 0x44);
++ VDUP(vector2, q, uint, u, 8, 16, 0x55);
++ VDUP(vector2, q, uint, u, 16, 8, 0x66);
++ VDUP(vector2, q, uint, u, 32, 4, 0x77);
++ VDUP(vector2, q, uint, u, 64, 2, 0x88);
++ VDUP(vector2, q, poly, p, 8, 16, 0x55);
++ VDUP(vector2, q, poly, p, 16, 8, 0x66);
++#if defined (FP16_SUPPORTED)
++ VDUP (vector2, q, float, f, 16, 8, 14.6f);
++#endif
++ VDUP(vector2, q, float, f, 32, 4, 33.8f);
++ VDUP(vector2, q, float, f, 64, 2, 33.8f);
++
++ TEST_VUZP1(, int, s, 8, 8);
++ TEST_VUZP1(, int, s, 16, 4);
++ TEST_VUZP1(, int, s, 32, 2);
++ TEST_VUZP1(, uint, u, 8, 8);
++ TEST_VUZP1(, uint, u, 16, 4);
++ TEST_VUZP1(, uint, u, 32, 2);
++ TEST_VUZP1(, poly, p, 8, 8);
++ TEST_VUZP1(, poly, p, 16, 4);
++#if defined (FP16_SUPPORTED)
++ TEST_VUZP1(, float, f, 16, 4);
++#endif
++ TEST_VUZP1(, float, f, 32, 2);
++
++ TEST_VUZP1(q, int, s, 8, 16);
++ TEST_VUZP1(q, int, s, 16, 8);
++ TEST_VUZP1(q, int, s, 32, 4);
++ TEST_VUZP1(q, int, s, 64, 2);
++ TEST_VUZP1(q, uint, u, 8, 16);
++ TEST_VUZP1(q, uint, u, 16, 8);
++ TEST_VUZP1(q, uint, u, 32, 4);
++ TEST_VUZP1(q, uint, u, 64, 2);
++ TEST_VUZP1(q, poly, p, 8, 16);
++ TEST_VUZP1(q, poly, p, 16, 8);
++#if defined (FP16_SUPPORTED)
++ TEST_VUZP1(q, float, f, 16, 8);
++#endif
++ TEST_VUZP1(q, float, f, 32, 4);
++ TEST_VUZP1(q, float, f, 64, 2);
+
-+ TEST_VLD1_LANE(, poly, p, 64, 1, 0);
-+ TEST_VLD1_LANE(q, poly, p, 64, 2, 0);
++#if defined (FP16_SUPPORTED)
++ CHECK_RESULTS (TEST_MSG, "");
++#else
++ CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
++#endif
+
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_lane_expected, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_lane_expected, "");
++#undef TEST_MSG
++#define TEST_MSG "VUZP2"
+
-+ /* vldX_p64 tests. */
-+#define DECL_VLDX(T1, W, N, X) \
-+ VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vldX_vector, T1, W, N, X); \
-+ VECT_VAR_DECL(vldX_result_bis_##X, T1, W, N)[X * N]
++#define TEST_VUZP2(Q, T1, T2, W, N) TEST_VUZP(2, Q, T1, T2, W, N)
+
-+#define TEST_VLDX(Q, T1, T2, W, N, X) \
-+ VECT_ARRAY_VAR(vldX_vector, T1, W, N, X) = \
-+ /* Use dedicated init buffer, of size X */ \
-+ vld##X##Q##_##T2##W(VECT_ARRAY_VAR(buffer_vld##X, T1, W, N, X)); \
-+ vst##X##Q##_##T2##W(VECT_VAR(vldX_result_bis_##X, T1, W, N), \
-+ VECT_ARRAY_VAR(vldX_vector, T1, W, N, X)); \
-+ memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(vldX_result_bis_##X, T1, W, N), \
-+ sizeof(VECT_VAR(result, T1, W, N)));
++/* Expected results. */
++VECT_VAR_DECL(expected2,int,8,8) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
++ 0x11, 0x11, 0x11, 0x11 };
++VECT_VAR_DECL(expected2,int,16,4) [] = { 0xfff1, 0xfff3, 0x22, 0x22 };
++VECT_VAR_DECL(expected2,int,32,2) [] = { 0xfffffff1, 0x33 };
++VECT_VAR_DECL(expected2,int,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(expected2,uint,8,8) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
++ 0x55, 0x55, 0x55, 0x55 };
++VECT_VAR_DECL(expected2,uint,16,4) [] = { 0xfff1, 0xfff3, 0x66, 0x66 };
++VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xfffffff1, 0x77 };
++VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
++ 0x55, 0x55, 0x55, 0x55 };
++VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff1, 0xfff3, 0x66, 0x66 };
++VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1700000, 0x42066666 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected2, hfloat, 16, 4) [] = { 0xcb80, 0xca80,
++ 0x4b4d, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected2,int,8,16) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
++ 0xf9, 0xfb, 0xfd, 0xff,
++ 0x11, 0x11, 0x11, 0x11,
++ 0x11, 0x11, 0x11, 0x11 };
++VECT_VAR_DECL(expected2,int,16,8) [] = { 0xfff1, 0xfff3, 0xfff5, 0xfff7,
++ 0x22, 0x22, 0x22, 0x22 };
++VECT_VAR_DECL(expected2,int,32,4) [] = { 0xfffffff1, 0xfffffff3,
++ 0x33, 0x33 };
++VECT_VAR_DECL(expected2,int,64,2) [] = { 0xfffffffffffffff1,
++ 0x44 };
++VECT_VAR_DECL(expected2,uint,8,16) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
++ 0xf9, 0xfb, 0xfd, 0xff,
++ 0x55, 0x55, 0x55, 0x55,
++ 0x55, 0x55, 0x55, 0x55 };
++VECT_VAR_DECL(expected2,uint,16,8) [] = { 0xfff1, 0xfff3, 0xfff5, 0xfff7,
++ 0x66, 0x66, 0x66, 0x66 };
++VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xfffffff1, 0xfffffff3, 0x77, 0x77 };
++VECT_VAR_DECL(expected2,uint,64,2) [] = { 0xfffffffffffffff1,
++ 0x88 };
++VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
++ 0xf9, 0xfb, 0xfd, 0xff,
++ 0x55, 0x55, 0x55, 0x55,
++ 0x55, 0x55, 0x55, 0x55 };
++VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff1, 0xfff3, 0xfff5, 0xfff7,
++ 0x66, 0x66, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected2, hfloat, 16, 8) [] = { 0xcb80, 0xca80, 0xc980, 0xc880,
++ 0x4b4d, 0x4b4d, 0x4b4d, 0x4b4d
++ };
++#endif
++VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1700000, 0xc1500000,
++ 0x42073333, 0x42073333 };
+
-+ /* Overwrite "result" with the contents of "result_bis"[Y]. */
-+#define TEST_EXTRA_CHUNK(T1, W, N, X,Y) \
-+ memcpy(VECT_VAR(result, T1, W, N), \
-+ &(VECT_VAR(vldX_result_bis_##X, T1, W, N)[Y*N]), \
-+ sizeof(VECT_VAR(result, T1, W, N)));
++ clean_results ();
++ CLEAN(expected2, int, 64, 1);
++ CLEAN(expected2, uint, 64, 1);
++
++ TEST_VUZP2(, int, s, 8, 8);
++ TEST_VUZP2(, int, s, 16, 4);
++ TEST_VUZP2(, int, s, 32, 2);
++ TEST_VUZP2(, uint, u, 8, 8);
++ TEST_VUZP2(, uint, u, 16, 4);
++ TEST_VUZP2(, uint, u, 32, 2);
++ TEST_VUZP2(, poly, p, 8, 8);
++ TEST_VUZP2(, poly, p, 16, 4);
++#if defined (FP16_SUPPORTED)
++ TEST_VUZP2(, float, f, 16, 4);
++#endif
++ TEST_VUZP2(, float, f, 32, 2);
++
++ TEST_VUZP2(q, int, s, 8, 16);
++ TEST_VUZP2(q, int, s, 16, 8);
++ TEST_VUZP2(q, int, s, 32, 4);
++ TEST_VUZP2(q, int, s, 64, 2);
++ TEST_VUZP2(q, uint, u, 8, 16);
++ TEST_VUZP2(q, uint, u, 16, 8);
++ TEST_VUZP2(q, uint, u, 32, 4);
++ TEST_VUZP2(q, uint, u, 64, 2);
++ TEST_VUZP2(q, poly, p, 8, 16);
++ TEST_VUZP2(q, poly, p, 16, 8);
++#if defined (FP16_SUPPORTED)
++ TEST_VUZP2(q, float, f, 16, 8);
++#endif
++ TEST_VUZP2(q, float, f, 32, 4);
++ TEST_VUZP2(q, float, f, 64, 2);
+
-+ DECL_VLDX(poly, 64, 1, 2);
-+ DECL_VLDX(poly, 64, 1, 3);
-+ DECL_VLDX(poly, 64, 1, 4);
++ CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
++#if defined (FP16_SUPPORTED)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected2, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected2, "");
++#endif
++}
+
-+ VECT_ARRAY_INIT2(buffer_vld2, poly, 64, 1);
-+ PAD(buffer_vld2_pad, poly, 64, 1);
-+ VECT_ARRAY_INIT3(buffer_vld3, poly, 64, 1);
-+ PAD(buffer_vld3_pad, poly, 64, 1);
-+ VECT_ARRAY_INIT4(buffer_vld4, poly, 64, 1);
-+ PAD(buffer_vld4_pad, poly, 64, 1);
++int main (void)
++{
++ exec_vuzp_half ();
++ return 0;
++}
+--- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip.c
+@@ -18,6 +18,10 @@ VECT_VAR_DECL(expected0,poly,8,8) [] = { 0xf0, 0xf4, 0x55, 0x55,
+ 0xf1, 0xf5, 0x55, 0x55 };
+ VECT_VAR_DECL(expected0,poly,16,4) [] = { 0xfff0, 0xfff2,
+ 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected0, hfloat, 16, 4) [] = { 0xcc00, 0xcb00,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
+ VECT_VAR_DECL(expected0,int,8,16) [] = { 0xf0, 0xf8, 0x11, 0x11,
+ 0xf1, 0xf9, 0x11, 0x11,
+@@ -41,6 +45,12 @@ VECT_VAR_DECL(expected0,poly,8,16) [] = { 0xf0, 0xf8, 0x55, 0x55,
+ 0xf3, 0xfb, 0x55, 0x55 };
+ VECT_VAR_DECL(expected0,poly,16,8) [] = { 0xfff0, 0xfff4, 0x66, 0x66,
+ 0xfff1, 0xfff5, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected0, hfloat, 16, 8) [] = { 0xcc00, 0xca00,
++ 0x4b4d, 0x4b4d,
++ 0xcb80, 0xc980,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected0,hfloat,32,4) [] = { 0xc1800000, 0xc1600000,
+ 0x42073333, 0x42073333 };
+
+@@ -59,6 +69,10 @@ VECT_VAR_DECL(expected1,poly,8,8) [] = { 0xf2, 0xf6, 0x55, 0x55,
+ 0xf3, 0xf7, 0x55, 0x55 };
+ VECT_VAR_DECL(expected1,poly,16,4) [] = { 0xfff1, 0xfff3,
+ 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected1, hfloat, 16, 4) [] = { 0xcb80, 0xca80,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected1,hfloat,32,2) [] = { 0x42066666, 0x42066666 };
+ VECT_VAR_DECL(expected1,int,8,16) [] = { 0xf4, 0xfc, 0x11, 0x11,
+ 0xf5, 0xfd, 0x11, 0x11,
+@@ -82,6 +96,12 @@ VECT_VAR_DECL(expected1,poly,8,16) [] = { 0xf4, 0xfc, 0x55, 0x55,
+ 0xf7, 0xff, 0x55, 0x55 };
+ VECT_VAR_DECL(expected1,poly,16,8) [] = { 0xfff2, 0xfff6, 0x66, 0x66,
+ 0xfff3, 0xfff7, 0x66, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected1, hfloat, 16, 8) [] = { 0xcb00, 0xc900,
++ 0x4b4d, 0x4b4d,
++ 0xca80, 0xc880,
++ 0x4b4d, 0x4b4d };
++#endif
+ VECT_VAR_DECL(expected1,hfloat,32,4) [] = { 0xc1700000, 0xc1500000,
+ 0x42073333, 0x42073333 };
+
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip_half.c
+@@ -0,0 +1,263 @@
++/* { dg-do run } */
++/* { dg-skip-if "" { arm*-*-* } } */
+
-+#undef TEST_MSG
-+#define TEST_MSG "VLD2/VLD2Q"
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX(, poly, p, 64, 1, 2);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld2_expected_0, "chunk 0");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_EXTRA_CHUNK(poly, 64, 1, 2, 1);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld2_expected_1, "chunk 1");
++#include <arm_neon.h>
++#include "arm-neon-ref.h"
++#include "compute-ref-data.h"
++
++/* Expected results. */
++VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0x11, 0xf1, 0x11,
++ 0xf2, 0x11, 0xf3, 0x11 };
++VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0x22, 0xfff1, 0x22 };
++VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0x33 };
++VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0x55, 0xf1, 0x55,
++ 0xf2, 0x55, 0xf3, 0x55 };
++VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0x66, 0xfff1, 0x66 };
++VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0x77 };
++VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
++VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0x55, 0xf1, 0x55,
++ 0xf2, 0x55, 0xf3, 0x55 };
++VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0x66, 0xfff1, 0x66 };
++VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0x42066666 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 4) [] = { 0xcc00, 0x4b4d,
++ 0xcb80, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0x11, 0xf1, 0x11,
++ 0xf2, 0x11, 0xf3, 0x11,
++ 0xf4, 0x11, 0xf5, 0x11,
++ 0xf6, 0x11, 0xf7, 0x11 };
++VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0x22, 0xfff1, 0x22,
++ 0xfff2, 0x22, 0xfff3, 0x22 };
++VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0x33,
++ 0xfffffff1, 0x33 };
++VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffffffffffffff0,
++ 0x44 };
++VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0x55, 0xf1, 0x55,
++ 0xf2, 0x55, 0xf3, 0x55,
++ 0xf4, 0x55, 0xf5, 0x55,
++ 0xf6, 0x55, 0xf7, 0x55 };
++VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0x66, 0xfff1, 0x66,
++ 0xfff2, 0x66, 0xfff3, 0x66 };
++VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0x77,
++ 0xfffffff1, 0x77 };
++VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff0,
++ 0x88 };
++VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0x55, 0xf1, 0x55,
++ 0xf2, 0x55, 0xf3, 0x55,
++ 0xf4, 0x55, 0xf5, 0x55,
++ 0xf6, 0x55, 0xf7, 0x55 };
++VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0x66, 0xfff1, 0x66,
++ 0xfff2, 0x66, 0xfff3, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected, hfloat, 16, 8) [] = { 0xcc00, 0x4b4d,
++ 0xcb80, 0x4b4d,
++ 0xcb00, 0x4b4d,
++ 0xca80, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0x42073333,
++ 0xc1700000, 0x42073333 };
+
-+#undef TEST_MSG
-+#define TEST_MSG "VLD3/VLD3Q"
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX(, poly, p, 64, 1, 3);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_expected_0, "chunk 0");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_EXTRA_CHUNK(poly, 64, 1, 3, 1);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_expected_1, "chunk 1");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_EXTRA_CHUNK(poly, 64, 1, 3, 2);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_expected_2, "chunk 2");
++#define TEST_MSG "VZIP1"
++void exec_vzip_half (void)
++{
++#define TEST_VZIP(PART, Q, T1, T2, W, N) \
++ VECT_VAR(vector_res, T1, W, N) = \
++ vzip##PART##Q##_##T2##W(VECT_VAR(vector, T1, W, N), \
++ VECT_VAR(vector2, T1, W, N)); \
++ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
-+#undef TEST_MSG
-+#define TEST_MSG "VLD4/VLD4Q"
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX(, poly, p, 64, 1, 4);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_expected_0, "chunk 0");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_EXTRA_CHUNK(poly, 64, 1, 4, 1);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_expected_1, "chunk 1");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_EXTRA_CHUNK(poly, 64, 1, 4, 2);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_expected_2, "chunk 2");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_EXTRA_CHUNK(poly, 64, 1, 4, 3);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_expected_3, "chunk 3");
++#define TEST_VZIP1(Q, T1, T2, W, N) TEST_VZIP(1, Q, T1, T2, W, N)
+
-+ /* vldX_dup_p64 tests. */
-+#define DECL_VLDX_DUP(T1, W, N, X) \
-+ VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vldX_dup_vector, T1, W, N, X); \
-+ VECT_VAR_DECL(vldX_dup_result_bis_##X, T1, W, N)[X * N]
++ /* Input vector can only have 64 bits. */
++ DECL_VARIABLE_ALL_VARIANTS(vector);
++ DECL_VARIABLE_ALL_VARIANTS(vector2);
++ DECL_VARIABLE(vector, float, 64, 2);
++ DECL_VARIABLE(vector2, float, 64, 2);
+
-+#define TEST_VLDX_DUP(Q, T1, T2, W, N, X) \
-+ VECT_ARRAY_VAR(vldX_dup_vector, T1, W, N, X) = \
-+ vld##X##Q##_dup_##T2##W(&VECT_VAR(buffer_dup, T1, W, N)[0]); \
-+ \
-+ vst##X##Q##_##T2##W(VECT_VAR(vldX_dup_result_bis_##X, T1, W, N), \
-+ VECT_ARRAY_VAR(vldX_dup_vector, T1, W, N, X)); \
-+ memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(vldX_dup_result_bis_##X, T1, W, N), \
-+ sizeof(VECT_VAR(result, T1, W, N)));
++ DECL_VARIABLE_ALL_VARIANTS(vector_res);
++ DECL_VARIABLE(vector_res, float, 64, 2);
+
-+ /* Overwrite "result" with the contents of "result_bis"[Y]. */
-+#define TEST_VLDX_DUP_EXTRA_CHUNK(T1, W, N, X,Y) \
-+ memcpy(VECT_VAR(result, T1, W, N), \
-+ &(VECT_VAR(vldX_dup_result_bis_##X, T1, W, N)[Y*N]), \
-+ sizeof(VECT_VAR(result, T1, W, N)));
++ clean_results ();
++ /* We don't have vzip1_T64x1, so set expected to the clean value. */
++ CLEAN(expected, int, 64, 1);
++ CLEAN(expected, uint, 64, 1);
+
-+ DECL_VLDX_DUP(poly, 64, 1, 2);
-+ DECL_VLDX_DUP(poly, 64, 1, 3);
-+ DECL_VLDX_DUP(poly, 64, 1, 4);
++ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
++#if defined (FP16_SUPPORTED)
++ VLOAD(vector, buffer, , float, f, 16, 4);
++ VLOAD(vector, buffer, q, float, f, 16, 8);
++#endif
++ VLOAD(vector, buffer, , float, f, 32, 2);
++ VLOAD(vector, buffer, q, float, f, 32, 4);
++ VLOAD(vector, buffer, q, float, f, 64, 2);
++
++ /* Choose arbitrary initialization values. */
++ VDUP(vector2, , int, s, 8, 8, 0x11);
++ VDUP(vector2, , int, s, 16, 4, 0x22);
++ VDUP(vector2, , int, s, 32, 2, 0x33);
++ VDUP(vector2, , uint, u, 8, 8, 0x55);
++ VDUP(vector2, , uint, u, 16, 4, 0x66);
++ VDUP(vector2, , uint, u, 32, 2, 0x77);
++ VDUP(vector2, , poly, p, 8, 8, 0x55);
++ VDUP(vector2, , poly, p, 16, 4, 0x66);
++#if defined (FP16_SUPPORTED)
++ VDUP (vector2, , float, f, 16, 4, 14.6f); /* 14.6f is 0x4b4d. */
++#endif
++ VDUP(vector2, , float, f, 32, 2, 33.6f);
++
++ VDUP(vector2, q, int, s, 8, 16, 0x11);
++ VDUP(vector2, q, int, s, 16, 8, 0x22);
++ VDUP(vector2, q, int, s, 32, 4, 0x33);
++ VDUP(vector2, q, int, s, 64, 2, 0x44);
++ VDUP(vector2, q, uint, u, 8, 16, 0x55);
++ VDUP(vector2, q, uint, u, 16, 8, 0x66);
++ VDUP(vector2, q, uint, u, 32, 4, 0x77);
++ VDUP(vector2, q, uint, u, 64, 2, 0x88);
++ VDUP(vector2, q, poly, p, 8, 16, 0x55);
++ VDUP(vector2, q, poly, p, 16, 8, 0x66);
++#if defined (FP16_SUPPORTED)
++ VDUP (vector2, q, float, f, 16, 8, 14.6f);
++#endif
++ VDUP(vector2, q, float, f, 32, 4, 33.8f);
++ VDUP(vector2, q, float, f, 64, 2, 33.8f);
++
++ TEST_VZIP1(, int, s, 8, 8);
++ TEST_VZIP1(, int, s, 16, 4);
++ TEST_VZIP1(, int, s, 32, 2);
++ TEST_VZIP1(, uint, u, 8, 8);
++ TEST_VZIP1(, uint, u, 16, 4);
++ TEST_VZIP1(, uint, u, 32, 2);
++ TEST_VZIP1(, poly, p, 8, 8);
++ TEST_VZIP1(, poly, p, 16, 4);
++#if defined (FP16_SUPPORTED)
++ TEST_VZIP1(, float, f, 16, 4);
++#endif
++ TEST_VZIP1(, float, f, 32, 2);
++
++ TEST_VZIP1(q, int, s, 8, 16);
++ TEST_VZIP1(q, int, s, 16, 8);
++ TEST_VZIP1(q, int, s, 32, 4);
++ TEST_VZIP1(q, int, s, 64, 2);
++ TEST_VZIP1(q, uint, u, 8, 16);
++ TEST_VZIP1(q, uint, u, 16, 8);
++ TEST_VZIP1(q, uint, u, 32, 4);
++ TEST_VZIP1(q, uint, u, 64, 2);
++ TEST_VZIP1(q, poly, p, 8, 16);
++ TEST_VZIP1(q, poly, p, 16, 8);
++#if defined (FP16_SUPPORTED)
++ TEST_VZIP1(q, float, f, 16, 8);
++#endif
++ TEST_VZIP1(q, float, f, 32, 4);
++ TEST_VZIP1(q, float, f, 64, 2);
+
++#if defined (FP16_SUPPORTED)
++ CHECK_RESULTS (TEST_MSG, "");
++#else
++ CHECK_RESULTS_NO_FP16 (TEST_MSG, "");
++#endif
+
+#undef TEST_MSG
-+#define TEST_MSG "VLD2_DUP/VLD2Q_DUP"
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX_DUP(, poly, p, 64, 1, 2);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld2_dup_expected_0, "chunk 0");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 2, 1);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld2_dup_expected_1, "chunk 1");
++#define TEST_MSG "VZIP2"
+
-+#undef TEST_MSG
-+#define TEST_MSG "VLD3_DUP/VLD3Q_DUP"
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX_DUP(, poly, p, 64, 1, 3);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_dup_expected_0, "chunk 0");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 3, 1);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_dup_expected_1, "chunk 1");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 3, 2);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld3_dup_expected_2, "chunk 2");
++#define TEST_VZIP2(Q, T1, T2, W, N) TEST_VZIP(2, Q, T1, T2, W, N)
+
-+#undef TEST_MSG
-+#define TEST_MSG "VLD4_DUP/VLD4Q_DUP"
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX_DUP(, poly, p, 64, 1, 4);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_dup_expected_0, "chunk 0");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 4, 1);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_dup_expected_1, "chunk 1");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 4, 2);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_dup_expected_2, "chunk 2");
-+ CLEAN(result, poly, 64, 1);
-+ TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 4, 3);
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_dup_expected_3, "chunk 3");
++/* Expected results. */
++VECT_VAR_DECL(expected2,int,8,8) [] = { 0xf4, 0x11, 0xf5, 0x11,
++ 0xf6, 0x11, 0xf7, 0x11 };
++VECT_VAR_DECL(expected2,int,16,4) [] = { 0xfff2, 0x22, 0xfff3, 0x22 };
++VECT_VAR_DECL(expected2,int,32,2) [] = { 0xfffffff1, 0x33 };
++VECT_VAR_DECL(expected2,int,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(expected2,uint,8,8) [] = { 0xf4, 0x55, 0xf5, 0x55,
++ 0xf6, 0x55, 0xf7, 0x55 };
++VECT_VAR_DECL(expected2,uint,16,4) [] = { 0xfff2, 0x66, 0xfff3, 0x66 };
++VECT_VAR_DECL(expected2,uint,32,2) [] = { 0xfffffff1, 0x77 };
++VECT_VAR_DECL(expected2,uint,64,1) [] = { 0xfffffffffffffff1 };
++VECT_VAR_DECL(expected2,poly,8,8) [] = { 0xf4, 0x55, 0xf5, 0x55,
++ 0xf6, 0x55, 0xf7, 0x55 };
++VECT_VAR_DECL(expected2,poly,16,4) [] = { 0xfff2, 0x66, 0xfff3, 0x66 };
++VECT_VAR_DECL(expected2,hfloat,32,2) [] = { 0xc1700000, 0x42066666 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected2, hfloat, 16, 4) [] = { 0xcb00, 0x4b4d,
++ 0xca80, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected2,int,8,16) [] = { 0xf8, 0x11, 0xf9, 0x11,
++ 0xfa, 0x11, 0xfb, 0x11,
++ 0xfc, 0x11, 0xfd, 0x11,
++ 0xfe, 0x11, 0xff, 0x11 };
++VECT_VAR_DECL(expected2,int,16,8) [] = { 0xfff4, 0x22, 0xfff5, 0x22,
++ 0xfff6, 0x22, 0xfff7, 0x22 };
++VECT_VAR_DECL(expected2,int,32,4) [] = { 0xfffffff2, 0x33,
++ 0xfffffff3, 0x33 };
++VECT_VAR_DECL(expected2,int,64,2) [] = { 0xfffffffffffffff1,
++ 0x44 };
++VECT_VAR_DECL(expected2,uint,8,16) [] = { 0xf8, 0x55, 0xf9, 0x55,
++ 0xfa, 0x55, 0xfb, 0x55,
++ 0xfc, 0x55, 0xfd, 0x55,
++ 0xfe, 0x55, 0xff, 0x55 };
++VECT_VAR_DECL(expected2,uint,16,8) [] = { 0xfff4, 0x66, 0xfff5, 0x66,
++ 0xfff6, 0x66, 0xfff7, 0x66 };
++VECT_VAR_DECL(expected2,uint,32,4) [] = { 0xfffffff2, 0x77,
++ 0xfffffff3, 0x77 };
++VECT_VAR_DECL(expected2,uint,64,2) [] = { 0xfffffffffffffff1,
++ 0x88 };
++VECT_VAR_DECL(expected2,poly,8,16) [] = { 0xf8, 0x55, 0xf9, 0x55,
++ 0xfa, 0x55, 0xfb, 0x55,
++ 0xfc, 0x55, 0xfd, 0x55,
++ 0xfe, 0x55, 0xff, 0x55 };
++VECT_VAR_DECL(expected2,poly,16,8) [] = { 0xfff4, 0x66, 0xfff5, 0x66,
++ 0xfff6, 0x66, 0xfff7, 0x66 };
++#if defined (FP16_SUPPORTED)
++VECT_VAR_DECL (expected2, hfloat, 16, 8) [] = { 0xca00, 0x4b4d,
++ 0xc980, 0x4b4d,
++ 0xc900, 0x4b4d,
++ 0xc880, 0x4b4d };
++#endif
++VECT_VAR_DECL(expected2,hfloat,32,4) [] = { 0xc1600000, 0x42073333,
++ 0xc1500000, 0x42073333 };
++ clean_results ();
++ CLEAN(expected2, int, 64, 1);
++ CLEAN(expected2, uint, 64, 1);
++
++ TEST_VZIP2(, int, s, 8, 8);
++ TEST_VZIP2(, int, s, 16, 4);
++ TEST_VZIP2(, int, s, 32, 2);
++ TEST_VZIP2(, uint, u, 8, 8);
++ TEST_VZIP2(, uint, u, 16, 4);
++ TEST_VZIP2(, uint, u, 32, 2);
++ TEST_VZIP2(, poly, p, 8, 8);
++ TEST_VZIP2(, poly, p, 16, 4);
++#if defined (FP16_SUPPORTED)
++ TEST_VZIP2(, float, f, 16, 4);
++#endif
++ TEST_VZIP2(, float, f, 32, 2);
++
++ TEST_VZIP2(q, int, s, 8, 16);
++ TEST_VZIP2(q, int, s, 16, 8);
++ TEST_VZIP2(q, int, s, 32, 4);
++ TEST_VZIP2(q, int, s, 64, 2);
++ TEST_VZIP2(q, uint, u, 8, 16);
++ TEST_VZIP2(q, uint, u, 16, 8);
++ TEST_VZIP2(q, uint, u, 32, 4);
++ TEST_VZIP2(q, uint, u, 64, 2);
++ TEST_VZIP2(q, poly, p, 8, 16);
++ TEST_VZIP2(q, poly, p, 16, 8);
++#if defined (FP16_SUPPORTED)
++ TEST_VZIP2(q, float, f, 16, 8);
++#endif
++ TEST_VZIP2(q, float, f, 32, 4);
++ TEST_VZIP2(q, float, f, 64, 2);
+
-+ /* vsli_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VSLI"
++ CHECK_RESULTS_NAMED (TEST_MSG, expected2, "");
++#if defined (FP16_SUPPORTED)
++ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected2, "");
++ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected2, "");
++#endif
++}
+
-+#define TEST_VSXI1(INSN, Q, T1, T2, W, N, V) \
-+ VECT_VAR(vsXi_vector_res, T1, W, N) = \
-+ INSN##Q##_n_##T2##W(VECT_VAR(vsXi_vector, T1, W, N), \
-+ VECT_VAR(vsXi_vector2, T1, W, N), \
-+ V); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vsXi_vector_res, T1, W, N))
++int main (void)
++{
++ exec_vzip_half ();
++ return 0;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/ands_3.c
+@@ -0,0 +1,12 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
+
-+#define TEST_VSXI(INSN, Q, T1, T2, W, N, V) \
-+ TEST_VSXI1(INSN, Q, T1, T2, W, N, V)
++int
++f9 (unsigned char x, int y)
++{
++ if (y > 1 && x == 0)
++ return 10;
++ return x;
++}
+
-+ DECL_VARIABLE(vsXi_vector, poly, 64, 1);
-+ DECL_VARIABLE(vsXi_vector, poly, 64, 2);
-+ DECL_VARIABLE(vsXi_vector2, poly, 64, 1);
-+ DECL_VARIABLE(vsXi_vector2, poly, 64, 2);
-+ DECL_VARIABLE(vsXi_vector_res, poly, 64, 1);
-+ DECL_VARIABLE(vsXi_vector_res, poly, 64, 2);
++/* { dg-final { scan-assembler "ands\t(x|w)\[0-9\]+,\[ \t\]*(x|w)\[0-9\]+,\[ \t\]*255" } } */
+--- a/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-1.c
+@@ -1,4 +1,5 @@
+ /* { dg-error "unknown" "" {target "aarch64*-*-*" } } */
++/* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
+ /* { dg-options "-O2 -mcpu=dummy" } */
+
+ void f ()
+--- a/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-2.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-2.c
+@@ -1,4 +1,5 @@
+ /* { dg-error "missing" "" {target "aarch64*-*-*" } } */
++/* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
+ /* { dg-options "-O2 -mcpu=cortex-a53+no" } */
+
+ void f ()
+--- a/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-3.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-3.c
+@@ -1,4 +1,5 @@
+ /* { dg-error "invalid feature" "" {target "aarch64*-*-*" } } */
++/* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
+ /* { dg-options "-O2 -mcpu=cortex-a53+dummy" } */
+
+ void f ()
+--- a/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-4.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-4.c
+@@ -1,4 +1,5 @@
+ /* { dg-error "missing" "" {target "aarch64*-*-*" } } */
++/* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
+ /* { dg-options "-O2 -mcpu=+dummy" } */
+
+ void f ()
+--- a/src/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
+@@ -110,6 +110,6 @@ main (int argc, char **argv)
+ /* vfmaq_lane_f64.
+ vfma_laneq_f64.
+ vfmaq_laneq_f64. */
+-/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d\\\[\[0-9\]+\\\]" 3 } } */
++/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2?d\\\[\[0-9\]+\\\]" 3 } } */
+
+
+--- a/src/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c
+@@ -111,6 +111,6 @@ main (int argc, char **argv)
+ /* vfmsq_lane_f64.
+ vfms_laneq_f64.
+ vfmsq_laneq_f64. */
+-/* { dg-final { scan-assembler-times "fmls\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d\\\[\[0-9\]+\\\]" 3 } } */
++/* { dg-final { scan-assembler-times "fmls\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2?d\\\[\[0-9\]+\\\]" 3 } } */
+
+
+--- a/src/gcc/testsuite/gcc.target/aarch64/fmovd-zero-reg.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/fmovd-zero-reg.c
+@@ -8,4 +8,4 @@ foo (void)
+ bar (0.0);
+ }
+
+-/* { dg-final { scan-assembler "fmov\\td0, xzr" } } */
++/* { dg-final { scan-assembler "movi\\td0, #0" } } */
+--- a/src/gcc/testsuite/gcc.target/aarch64/fmovf-zero-reg.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/fmovf-zero-reg.c
+@@ -8,4 +8,4 @@ foo (void)
+ bar (0.0);
+ }
+
+-/* { dg-final { scan-assembler "fmov\\ts0, wzr" } } */
++/* { dg-final { scan-assembler "movi\\tv0\.2s, #0" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_subreg_1.c
+@@ -0,0 +1,30 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 -fdump-rtl-ce1" } */
+
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++/* Check that the inner if is transformed into CSELs. */
+
-+ VLOAD(vsXi_vector, buffer, , poly, p, 64, 1);
-+ VLOAD(vsXi_vector, buffer, q, poly, p, 64, 2);
++int
++foo (int *x, int *z, int a)
++{
++ int b = 0;
++ int c = 0;
++ int d = 0;
++ int i;
+
-+ VDUP(vsXi_vector2, , poly, p, 64, 1, 2);
-+ VDUP(vsXi_vector2, q, poly, p, 64, 2, 3);
++ for (i = 0; i < a; i++)
++ {
++ if (x[i] < c)
++ {
++ b = z[i];
++ if (c < b)
++ {
++ c = b;
++ d = i;
++ }
++ }
++ }
+
-+ TEST_VSXI(vsli, , poly, p, 64, 1, 3);
-+ TEST_VSXI(vsli, q, poly, p, 64, 2, 53);
++ return c + d;
++}
+
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vsli_expected, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vsli_expected, "");
++/* { dg-final { scan-rtl-dump "if-conversion succeeded through noce_convert_multiple_sets" "ce1" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
+@@ -0,0 +1,20 @@
++/* { dg-options "-O2" } */
+
-+ /* Test cases with maximum shift amount. */
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++/* Check that we can use a REG + IMM addressing mode when moving an unaligned
++ TImode value to and from memory. */
+
-+ TEST_VSXI(vsli, , poly, p, 64, 1, 63);
-+ TEST_VSXI(vsli, q, poly, p, 64, 2, 63);
++struct foo
++{
++ long long b;
++ __int128 a;
++} __attribute__ ((packed));
+
-+#define COMMENT "(max shift amount)"
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vsli_expected_max_shift, COMMENT);
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vsli_expected_max_shift, COMMENT);
++void
++bar (struct foo *p, struct foo *q)
++{
++ p->a = q->a;
++}
++
++/* { dg-final { scan-assembler-not "add\tx\[0-9\]+, x\[0-9\]+" } } */
++/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */
++/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/pr37780_1.c
+@@ -0,0 +1,46 @@
++/* Test that we can remove the conditional move due to CLZ
++ and CTZ being defined at zero. */
++
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++
++int
++fooctz (int i)
++{
++ return (i == 0) ? 32 : __builtin_ctz (i);
++}
++
++int
++fooctz2 (int i)
++{
++ return (i != 0) ? __builtin_ctz (i) : 32;
++}
++
++unsigned int
++fooctz3 (unsigned int i)
++{
++ return (i > 0) ? __builtin_ctz (i) : 32;
++}
+
-+ /* vsri_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VSRI"
++/* { dg-final { scan-assembler-times "rbit\t*" 3 } } */
+
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++int
++fooclz (int i)
++{
++ return (i == 0) ? 32 : __builtin_clz (i);
++}
+
-+ VLOAD(vsXi_vector, buffer, , poly, p, 64, 1);
-+ VLOAD(vsXi_vector, buffer, q, poly, p, 64, 2);
++int
++fooclz2 (int i)
++{
++ return (i != 0) ? __builtin_clz (i) : 32;
++}
+
-+ VDUP(vsXi_vector2, , poly, p, 64, 1, 2);
-+ VDUP(vsXi_vector2, q, poly, p, 64, 2, 3);
++unsigned int
++fooclz3 (unsigned int i)
++{
++ return (i > 0) ? __builtin_clz (i) : 32;
++}
+
-+ TEST_VSXI(vsri, , poly, p, 64, 1, 3);
-+ TEST_VSXI(vsri, q, poly, p, 64, 2, 53);
++/* { dg-final { scan-assembler-times "clz\t" 6 } } */
++/* { dg-final { scan-assembler-not "cmp\t.*0" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/pr63874.c
+@@ -0,0 +1,22 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++/* { dg-skip-if "Not applicable for mcmodel=large" { aarch64*-*-* } { "-mcmodel=large" } { "" } } */
+
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vsri_expected, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vsri_expected, "");
++extern void __attribute__((weak)) foo_weakref (void);
++void __attribute__((weak, noinline)) bar (void)
++{
++ return;
++}
++void (*f) (void);
++void (*g) (void);
+
-+ /* Test cases with maximum shift amount. */
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++int
++main (void)
++{
++ f = &foo_weakref;
++ g = &bar;
++ return 0;
++}
+
-+ TEST_VSXI(vsri, , poly, p, 64, 1, 64);
-+ TEST_VSXI(vsri, q, poly, p, 64, 2, 64);
++/* { dg-final { scan-assembler-not "adr*foo_weakref" } } */
++/* { dg-final { scan-assembler-not "\\.(word|xword)\tbar" } } */
+--- a/src/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
+@@ -1,4 +1,4 @@
+-/* Test the `v[min|max]nm{q}_f*' AArch64 SIMD intrinsic. */
++/* Test the `v[min|max]{nm}{q}_f*' AArch64 SIMD intrinsic. */
+
+ /* { dg-do run } */
+ /* { dg-options "-O2" } */
+@@ -18,6 +18,7 @@ extern void abort ();
+ int
+ main (int argc, char **argv)
+ {
++ /* v{min|max}nm_f32 normal. */
+ float32x2_t f32x2_input1 = vdup_n_f32 (-1.0);
+ float32x2_t f32x2_input2 = vdup_n_f32 (0.0);
+ float32x2_t f32x2_exp_minnm = vdup_n_f32 (-1.0);
+@@ -28,6 +29,7 @@ main (int argc, char **argv)
+ CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm);
+ CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm);
+
++ /* v{min|max}nm_f32 NaN. */
+ f32x2_input1 = vdup_n_f32 (__builtin_nanf (""));
+ f32x2_input2 = vdup_n_f32 (1.0);
+ f32x2_exp_minnm = vdup_n_f32 (1.0);
+@@ -38,6 +40,7 @@ main (int argc, char **argv)
+ CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm);
+ CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm);
+
++ /* v{min|max}nmq_f32 normal. */
+ float32x4_t f32x4_input1 = vdupq_n_f32 (-1024.0);
+ float32x4_t f32x4_input2 = vdupq_n_f32 (77.0);
+ float32x4_t f32x4_exp_minnm = vdupq_n_f32 (-1024.0);
+@@ -48,6 +51,7 @@ main (int argc, char **argv)
+ CHECK (uint32_t, 4, f32x4_ret_minnm, f32x4_exp_minnm);
+ CHECK (uint32_t, 4, f32x4_ret_maxnm, f32x4_exp_maxnm);
+
++ /* v{min|max}nmq_f32 NaN. */
+ f32x4_input1 = vdupq_n_f32 (-__builtin_nanf (""));
+ f32x4_input2 = vdupq_n_f32 (-1.0);
+ f32x4_exp_minnm = vdupq_n_f32 (-1.0);
+@@ -58,16 +62,57 @@ main (int argc, char **argv)
+ CHECK (uint32_t, 4, f32x4_ret_minnm, f32x4_exp_minnm);
+ CHECK (uint32_t, 4, f32x4_ret_maxnm, f32x4_exp_maxnm);
+
++ /* v{min|max}nm_f64 normal. */
++ float64x1_t f64x1_input1 = vdup_n_f64 (1.23);
++ float64x1_t f64x1_input2 = vdup_n_f64 (4.56);
++ float64x1_t f64x1_exp_minnm = vdup_n_f64 (1.23);
++ float64x1_t f64x1_exp_maxnm = vdup_n_f64 (4.56);
++ float64x1_t f64x1_ret_minnm = vminnm_f64 (f64x1_input1, f64x1_input2);
++ float64x1_t f64x1_ret_maxnm = vmaxnm_f64 (f64x1_input1, f64x1_input2);
++ CHECK (uint64_t, 1, f64x1_ret_minnm, f64x1_exp_minnm);
++ CHECK (uint64_t, 1, f64x1_ret_maxnm, f64x1_exp_maxnm);
++
++ /* v{min|max}_f64 normal. */
++ float64x1_t f64x1_exp_min = vdup_n_f64 (1.23);
++ float64x1_t f64x1_exp_max = vdup_n_f64 (4.56);
++ float64x1_t f64x1_ret_min = vmin_f64 (f64x1_input1, f64x1_input2);
++ float64x1_t f64x1_ret_max = vmax_f64 (f64x1_input1, f64x1_input2);
++ CHECK (uint64_t, 1, f64x1_ret_min, f64x1_exp_min);
++ CHECK (uint64_t, 1, f64x1_ret_max, f64x1_exp_max);
++
++ /* v{min|max}nmq_f64 normal. */
+ float64x2_t f64x2_input1 = vdupq_n_f64 (1.23);
+ float64x2_t f64x2_input2 = vdupq_n_f64 (4.56);
+ float64x2_t f64x2_exp_minnm = vdupq_n_f64 (1.23);
+ float64x2_t f64x2_exp_maxnm = vdupq_n_f64 (4.56);
+ float64x2_t f64x2_ret_minnm = vminnmq_f64 (f64x2_input1, f64x2_input2);
+ float64x2_t f64x2_ret_maxnm = vmaxnmq_f64 (f64x2_input1, f64x2_input2);
+-
+ CHECK (uint64_t, 2, f64x2_ret_minnm, f64x2_exp_minnm);
+ CHECK (uint64_t, 2, f64x2_ret_maxnm, f64x2_exp_maxnm);
+
++ /* v{min|max}nm_f64 NaN. */
++ f64x1_input1 = vdup_n_f64 (-__builtin_nanf (""));
++ f64x1_input2 = vdup_n_f64 (1.0);
++ f64x1_exp_minnm = vdup_n_f64 (1.0);
++ f64x1_exp_maxnm = vdup_n_f64 (1.0);
++ f64x1_ret_minnm = vminnm_f64 (f64x1_input1, f64x1_input2);
++ f64x1_ret_maxnm = vmaxnm_f64 (f64x1_input1, f64x1_input2);
++
++ CHECK (uint64_t, 1, f64x1_ret_minnm, f64x1_exp_minnm);
++ CHECK (uint64_t, 1, f64x1_ret_maxnm, f64x1_exp_maxnm);
++
++ /* v{min|max}_f64 NaN. */
++ f64x1_input1 = vdup_n_f64 (-__builtin_nanf (""));
++ f64x1_input2 = vdup_n_f64 (1.0);
++ f64x1_exp_minnm = vdup_n_f64 (-__builtin_nanf (""));
++ f64x1_exp_maxnm = vdup_n_f64 (-__builtin_nanf (""));
++ f64x1_ret_minnm = vmin_f64 (f64x1_input1, f64x1_input2);
++ f64x1_ret_maxnm = vmax_f64 (f64x1_input1, f64x1_input2);
++
++ CHECK (uint64_t, 1, f64x1_ret_minnm, f64x1_exp_minnm);
++ CHECK (uint64_t, 1, f64x1_ret_maxnm, f64x1_exp_maxnm);
++
++ /* v{min|max}nmq_f64 NaN. */
+ f64x2_input1 = vdupq_n_f64 (-__builtin_nan (""));
+ f64x2_input2 = vdupq_n_f64 (1.0);
+ f64x2_exp_minnm = vdupq_n_f64 (1.0);
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c
+@@ -0,0 +1,541 @@
++/* Test the vmul_n_f64 AArch64 SIMD intrinsic. */
+
-+#define COMMENT "(max shift amount)"
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vsri_expected_max_shift, COMMENT);
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vsri_expected_max_shift, COMMENT);
++/* { dg-do run } */
++/* { dg-options "-O2 --save-temps" } */
+
-+ /* vst1_lane_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VST1_LANE/VST1_LANEQ"
++#include "arm_neon.h"
+
-+#define TEST_VST1_LANE(Q, T1, T2, W, N, L) \
-+ VECT_VAR(vst1_lane_vector, T1, W, N) = \
-+ vld1##Q##_##T2##W(VECT_VAR(buffer, T1, W, N)); \
-+ vst1##Q##_lane_##T2##W(VECT_VAR(result, T1, W, N), \
-+ VECT_VAR(vst1_lane_vector, T1, W, N), L)
++extern void abort (void);
+
-+ DECL_VARIABLE(vst1_lane_vector, poly, 64, 1);
-+ DECL_VARIABLE(vst1_lane_vector, poly, 64, 2);
++#define A (132.4f)
++#define B (-0.0f)
++#define C (-34.8f)
++#define D (289.34f)
++float32_t expected2_1[2] = {A * A, B * A};
++float32_t expected2_2[2] = {A * B, B * B};
++float32_t expected4_1[4] = {A * A, B * A, C * A, D * A};
++float32_t expected4_2[4] = {A * B, B * B, C * B, D * B};
++float32_t expected4_3[4] = {A * C, B * C, C * C, D * C};
++float32_t expected4_4[4] = {A * D, B * D, C * D, D * D};
++float32_t _elemA = A;
++float32_t _elemB = B;
++float32_t _elemC = C;
++float32_t _elemD = D;
+
-+ CLEAN(result, poly, 64, 1);
-+ CLEAN(result, poly, 64, 2);
++#define AD (1234.5)
++#define BD (-0.0)
++#define CD (71.3)
++#define DD (-1024.4)
++float64_t expectedd2_1[2] = {AD * CD, BD * CD};
++float64_t expectedd2_2[2] = {AD * DD, BD * DD};
++float64_t _elemdC = CD;
++float64_t _elemdD = DD;
+
-+ TEST_VST1_LANE(, poly, p, 64, 1, 0);
-+ TEST_VST1_LANE(q, poly, p, 64, 2, 0);
+
-+ CHECK(TEST_MSG, poly, 64, 1, PRIx64, vst1_lane_expected, "");
-+ CHECK(TEST_MSG, poly, 64, 2, PRIx64, vst1_lane_expected, "");
++#define AS (1024)
++#define BS (-31)
++#define CS (0)
++#define DS (655)
++int32_t expecteds2_1[2] = {AS * AS, BS * AS};
++int32_t expecteds2_2[2] = {AS * BS, BS * BS};
++int32_t expecteds4_1[4] = {AS * AS, BS * AS, CS * AS, DS * AS};
++int32_t expecteds4_2[4] = {AS * BS, BS * BS, CS * BS, DS * BS};
++int32_t expecteds4_3[4] = {AS * CS, BS * CS, CS * CS, DS * CS};
++int32_t expecteds4_4[4] = {AS * DS, BS * DS, CS * DS, DS * DS};
++int32_t _elemsA = AS;
++int32_t _elemsB = BS;
++int32_t _elemsC = CS;
++int32_t _elemsD = DS;
+
-+ return 0;
-+}
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vfms_vfma_n.c
-@@ -0,0 +1,490 @@
-+#include <arm_neon.h>
-+#include "arm-neon-ref.h"
-+#include "compute-ref-data.h"
++#define AH ((int16_t) 0)
++#define BH ((int16_t) -32)
++#define CH ((int16_t) 102)
++#define DH ((int16_t) -51)
++#define EH ((int16_t) 71)
++#define FH ((int16_t) -91)
++#define GH ((int16_t) 48)
++#define HH ((int16_t) 255)
++int16_t expectedh4_1[4] = {AH * AH, BH * AH, CH * AH, DH * AH};
++int16_t expectedh4_2[4] = {AH * BH, BH * BH, CH * BH, DH * BH};
++int16_t expectedh4_3[4] = {AH * CH, BH * CH, CH * CH, DH * CH};
++int16_t expectedh4_4[4] = {AH * DH, BH * DH, CH * DH, DH * DH};
++int16_t expectedh8_1[8] = {AH * AH, BH * AH, CH * AH, DH * AH,
++ EH * AH, FH * AH, GH * AH, HH * AH};
++int16_t expectedh8_2[8] = {AH * BH, BH * BH, CH * BH, DH * BH,
++ EH * BH, FH * BH, GH * BH, HH * BH};
++int16_t expectedh8_3[8] = {AH * CH, BH * CH, CH * CH, DH * CH,
++ EH * CH, FH * CH, GH * CH, HH * CH};
++int16_t expectedh8_4[8] = {AH * DH, BH * DH, CH * DH, DH * DH,
++ EH * DH, FH * DH, GH * DH, HH * DH};
++int16_t expectedh8_5[8] = {AH * EH, BH * EH, CH * EH, DH * EH,
++ EH * EH, FH * EH, GH * EH, HH * EH};
++int16_t expectedh8_6[8] = {AH * FH, BH * FH, CH * FH, DH * FH,
++ EH * FH, FH * FH, GH * FH, HH * FH};
++int16_t expectedh8_7[8] = {AH * GH, BH * GH, CH * GH, DH * GH,
++ EH * GH, FH * GH, GH * GH, HH * GH};
++int16_t expectedh8_8[8] = {AH * HH, BH * HH, CH * HH, DH * HH,
++ EH * HH, FH * HH, GH * HH, HH * HH};
++int16_t _elemhA = AH;
++int16_t _elemhB = BH;
++int16_t _elemhC = CH;
++int16_t _elemhD = DH;
++int16_t _elemhE = EH;
++int16_t _elemhF = FH;
++int16_t _elemhG = GH;
++int16_t _elemhH = HH;
+
-+#if defined(__aarch64__) && defined(__ARM_FEATURE_FMA)
++#define AUS (1024)
++#define BUS (31)
++#define CUS (0)
++#define DUS (655)
++uint32_t expectedus2_1[2] = {AUS * AUS, BUS * AUS};
++uint32_t expectedus2_2[2] = {AUS * BUS, BUS * BUS};
++uint32_t expectedus4_1[4] = {AUS * AUS, BUS * AUS, CUS * AUS, DUS * AUS};
++uint32_t expectedus4_2[4] = {AUS * BUS, BUS * BUS, CUS * BUS, DUS * BUS};
++uint32_t expectedus4_3[4] = {AUS * CUS, BUS * CUS, CUS * CUS, DUS * CUS};
++uint32_t expectedus4_4[4] = {AUS * DUS, BUS * DUS, CUS * DUS, DUS * DUS};
++uint32_t _elemusA = AUS;
++uint32_t _elemusB = BUS;
++uint32_t _elemusC = CUS;
++uint32_t _elemusD = DUS;
+
-+#define A0 123.4f
-+#define A1 -3.8f
-+#define A2 -29.4f
-+#define A3 (__builtin_inff ())
-+#define A4 0.0f
-+#define A5 24.0f
-+#define A6 124.0f
-+#define A7 1024.0f
++#define AUH ((uint16_t) 0)
++#define BUH ((uint16_t) 32)
++#define CUH ((uint16_t) 102)
++#define DUH ((uint16_t) 51)
++#define EUH ((uint16_t) 71)
++#define FUH ((uint16_t) 91)
++#define GUH ((uint16_t) 48)
++#define HUH ((uint16_t) 255)
++uint16_t expecteduh4_1[4] = {AUH * AUH, BUH * AUH, CUH * AUH, DUH * AUH};
++uint16_t expecteduh4_2[4] = {AUH * BUH, BUH * BUH, CUH * BUH, DUH * BUH};
++uint16_t expecteduh4_3[4] = {AUH * CUH, BUH * CUH, CUH * CUH, DUH * CUH};
++uint16_t expecteduh4_4[4] = {AUH * DUH, BUH * DUH, CUH * DUH, DUH * DUH};
++uint16_t expecteduh8_1[8] = {AUH * AUH, BUH * AUH, CUH * AUH, DUH * AUH,
++ EUH * AUH, FUH * AUH, GUH * AUH, HUH * AUH};
++uint16_t expecteduh8_2[8] = {AUH * BUH, BUH * BUH, CUH * BUH, DUH * BUH,
++ EUH * BUH, FUH * BUH, GUH * BUH, HUH * BUH};
++uint16_t expecteduh8_3[8] = {AUH * CUH, BUH * CUH, CUH * CUH, DUH * CUH,
++ EUH * CUH, FUH * CUH, GUH * CUH, HUH * CUH};
++uint16_t expecteduh8_4[8] = {AUH * DUH, BUH * DUH, CUH * DUH, DUH * DUH,
++ EUH * DUH, FUH * DUH, GUH * DUH, HUH * DUH};
++uint16_t expecteduh8_5[8] = {AUH * EUH, BUH * EUH, CUH * EUH, DUH * EUH,
++ EUH * EUH, FUH * EUH, GUH * EUH, HUH * EUH};
++uint16_t expecteduh8_6[8] = {AUH * FUH, BUH * FUH, CUH * FUH, DUH * FUH,
++ EUH * FUH, FUH * FUH, GUH * FUH, HUH * FUH};
++uint16_t expecteduh8_7[8] = {AUH * GUH, BUH * GUH, CUH * GUH, DUH * GUH,
++ EUH * GUH, FUH * GUH, GUH * GUH, HUH * GUH};
++uint16_t expecteduh8_8[8] = {AUH * HUH, BUH * HUH, CUH * HUH, DUH * HUH,
++ EUH * HUH, FUH * HUH, GUH * HUH, HUH * HUH};
++uint16_t _elemuhA = AUH;
++uint16_t _elemuhB = BUH;
++uint16_t _elemuhC = CUH;
++uint16_t _elemuhD = DUH;
++uint16_t _elemuhE = EUH;
++uint16_t _elemuhF = FUH;
++uint16_t _elemuhG = GUH;
++uint16_t _elemuhH = HUH;
+
-+#define B0 -5.8f
-+#define B1 -0.0f
-+#define B2 -10.8f
-+#define B3 10.0f
-+#define B4 23.4f
-+#define B5 -1234.8f
-+#define B6 8.9f
-+#define B7 4.0f
++void
++check_v2sf (float32_t elemA, float32_t elemB)
++{
++ int32_t indx;
++ const float32_t vec32x2_buf[2] = {A, B};
++ float32x2_t vec32x2_src = vld1_f32 (vec32x2_buf);
++ float32_t vec32x2_res[2];
+
-+#define E0 9.8f
-+#define E1 -1024.0f
-+#define E2 (-__builtin_inff ())
-+#define E3 479.0f
-+float32_t elem0 = E0;
-+float32_t elem1 = E1;
-+float32_t elem2 = E2;
-+float32_t elem3 = E3;
++ vst1_f32 (vec32x2_res, vmul_n_f32 (vec32x2_src, elemA));
+
-+#define DA0 1231234.4
-+#define DA1 -3.8
-+#define DA2 -2980.4
-+#define DA3 -5.8
-+#define DA4 0.01123
-+#define DA5 24.0
-+#define DA6 124.12345
-+#define DA7 1024.0
++ for (indx = 0; indx < 2; indx++)
++ if (* (uint32_t *) &vec32x2_res[indx] != * (uint32_t *) &expected2_1[indx])
++ abort ();
+
-+#define DB0 -5.8
-+#define DB1 (__builtin_inf ())
-+#define DB2 -105.8
-+#define DB3 10.0
-+#define DB4 (-__builtin_inf ())
-+#define DB5 -1234.8
-+#define DB6 848.9
-+#define DB7 44444.0
++ vst1_f32 (vec32x2_res, vmul_n_f32 (vec32x2_src, elemB));
+
-+#define DE0 9.8
-+#define DE1 -1024.0
-+#define DE2 105.8
-+#define DE3 479.0
-+float64_t delem0 = DE0;
-+float64_t delem1 = DE1;
-+float64_t delem2 = DE2;
-+float64_t delem3 = DE3;
++ for (indx = 0; indx < 2; indx++)
++ if (* (uint32_t *) &vec32x2_res[indx] != * (uint32_t *) &expected2_2[indx])
++ abort ();
+
-+/* Expected results for vfms_n. */
++/* { dg-final { scan-assembler-times "fmul\tv\[0-9\]+\.2s, v\[0-9\]+\.2s, v\[0-9\]+\.s\\\[0\\\]" 2 } } */
++}
+
-+VECT_VAR_DECL(expectedfms0, float, 32, 2) [] = {A0 + -B0 * E0, A1 + -B1 * E0};
-+VECT_VAR_DECL(expectedfms1, float, 32, 2) [] = {A2 + -B2 * E1, A3 + -B3 * E1};
-+VECT_VAR_DECL(expectedfms2, float, 32, 2) [] = {A4 + -B4 * E2, A5 + -B5 * E2};
-+VECT_VAR_DECL(expectedfms3, float, 32, 2) [] = {A6 + -B6 * E3, A7 + -B7 * E3};
-+VECT_VAR_DECL(expectedfma0, float, 32, 2) [] = {A0 + B0 * E0, A1 + B1 * E0};
-+VECT_VAR_DECL(expectedfma1, float, 32, 2) [] = {A2 + B2 * E1, A3 + B3 * E1};
-+VECT_VAR_DECL(expectedfma2, float, 32, 2) [] = {A4 + B4 * E2, A5 + B5 * E2};
-+VECT_VAR_DECL(expectedfma3, float, 32, 2) [] = {A6 + B6 * E3, A7 + B7 * E3};
++void
++check_v4sf (float32_t elemA, float32_t elemB, float32_t elemC, float32_t elemD)
++{
++ int32_t indx;
++ const float32_t vec32x4_buf[4] = {A, B, C, D};
++ float32x4_t vec32x4_src = vld1q_f32 (vec32x4_buf);
++ float32_t vec32x4_res[4];
+
-+hfloat32_t * VECT_VAR (expectedfms0_static, hfloat, 32, 2) =
-+ (hfloat32_t *) VECT_VAR (expectedfms0, float, 32, 2);
-+hfloat32_t * VECT_VAR (expectedfms1_static, hfloat, 32, 2) =
-+ (hfloat32_t *) VECT_VAR (expectedfms1, float, 32, 2);
-+hfloat32_t * VECT_VAR (expectedfms2_static, hfloat, 32, 2) =
-+ (hfloat32_t *) VECT_VAR (expectedfms2, float, 32, 2);
-+hfloat32_t * VECT_VAR (expectedfms3_static, hfloat, 32, 2) =
-+ (hfloat32_t *) VECT_VAR (expectedfms3, float, 32, 2);
-+hfloat32_t * VECT_VAR (expectedfma0_static, hfloat, 32, 2) =
-+ (hfloat32_t *) VECT_VAR (expectedfma0, float, 32, 2);
-+hfloat32_t * VECT_VAR (expectedfma1_static, hfloat, 32, 2) =
-+ (hfloat32_t *) VECT_VAR (expectedfma1, float, 32, 2);
-+hfloat32_t * VECT_VAR (expectedfma2_static, hfloat, 32, 2) =
-+ (hfloat32_t *) VECT_VAR (expectedfma2, float, 32, 2);
-+hfloat32_t * VECT_VAR (expectedfma3_static, hfloat, 32, 2) =
-+ (hfloat32_t *) VECT_VAR (expectedfma3, float, 32, 2);
++ vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemA));
+
++ for (indx = 0; indx < 4; indx++)
++ if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_1[indx])
++ abort ();
+
-+VECT_VAR_DECL(expectedfms0, float, 32, 4) [] = {A0 + -B0 * E0, A1 + -B1 * E0,
-+ A2 + -B2 * E0, A3 + -B3 * E0};
-+VECT_VAR_DECL(expectedfms1, float, 32, 4) [] = {A4 + -B4 * E1, A5 + -B5 * E1,
-+ A6 + -B6 * E1, A7 + -B7 * E1};
-+VECT_VAR_DECL(expectedfms2, float, 32, 4) [] = {A0 + -B0 * E2, A2 + -B2 * E2,
-+ A4 + -B4 * E2, A6 + -B6 * E2};
-+VECT_VAR_DECL(expectedfms3, float, 32, 4) [] = {A1 + -B1 * E3, A3 + -B3 * E3,
-+ A5 + -B5 * E3, A7 + -B7 * E3};
-+VECT_VAR_DECL(expectedfma0, float, 32, 4) [] = {A0 + B0 * E0, A1 + B1 * E0,
-+ A2 + B2 * E0, A3 + B3 * E0};
-+VECT_VAR_DECL(expectedfma1, float, 32, 4) [] = {A4 + B4 * E1, A5 + B5 * E1,
-+ A6 + B6 * E1, A7 + B7 * E1};
-+VECT_VAR_DECL(expectedfma2, float, 32, 4) [] = {A0 + B0 * E2, A2 + B2 * E2,
-+ A4 + B4 * E2, A6 + B6 * E2};
-+VECT_VAR_DECL(expectedfma3, float, 32, 4) [] = {A1 + B1 * E3, A3 + B3 * E3,
-+ A5 + B5 * E3, A7 + B7 * E3};
++ vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemB));
+
-+hfloat32_t * VECT_VAR (expectedfms0_static, hfloat, 32, 4) =
-+ (hfloat32_t *) VECT_VAR (expectedfms0, float, 32, 4);
-+hfloat32_t * VECT_VAR (expectedfms1_static, hfloat, 32, 4) =
-+ (hfloat32_t *) VECT_VAR (expectedfms1, float, 32, 4);
-+hfloat32_t * VECT_VAR (expectedfms2_static, hfloat, 32, 4) =
-+ (hfloat32_t *) VECT_VAR (expectedfms2, float, 32, 4);
-+hfloat32_t * VECT_VAR (expectedfms3_static, hfloat, 32, 4) =
-+ (hfloat32_t *) VECT_VAR (expectedfms3, float, 32, 4);
-+hfloat32_t * VECT_VAR (expectedfma0_static, hfloat, 32, 4) =
-+ (hfloat32_t *) VECT_VAR (expectedfma0, float, 32, 4);
-+hfloat32_t * VECT_VAR (expectedfma1_static, hfloat, 32, 4) =
-+ (hfloat32_t *) VECT_VAR (expectedfma1, float, 32, 4);
-+hfloat32_t * VECT_VAR (expectedfma2_static, hfloat, 32, 4) =
-+ (hfloat32_t *) VECT_VAR (expectedfma2, float, 32, 4);
-+hfloat32_t * VECT_VAR (expectedfma3_static, hfloat, 32, 4) =
-+ (hfloat32_t *) VECT_VAR (expectedfma3, float, 32, 4);
++ for (indx = 0; indx < 4; indx++)
++ if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_2[indx])
++ abort ();
+
-+VECT_VAR_DECL(expectedfms0, float, 64, 2) [] = {DA0 + -DB0 * DE0,
-+ DA1 + -DB1 * DE0};
-+VECT_VAR_DECL(expectedfms1, float, 64, 2) [] = {DA2 + -DB2 * DE1,
-+ DA3 + -DB3 * DE1};
-+VECT_VAR_DECL(expectedfms2, float, 64, 2) [] = {DA4 + -DB4 * DE2,
-+ DA5 + -DB5 * DE2};
-+VECT_VAR_DECL(expectedfms3, float, 64, 2) [] = {DA6 + -DB6 * DE3,
-+ DA7 + -DB7 * DE3};
-+VECT_VAR_DECL(expectedfma0, float, 64, 2) [] = {DA0 + DB0 * DE0,
-+ DA1 + DB1 * DE0};
-+VECT_VAR_DECL(expectedfma1, float, 64, 2) [] = {DA2 + DB2 * DE1,
-+ DA3 + DB3 * DE1};
-+VECT_VAR_DECL(expectedfma2, float, 64, 2) [] = {DA4 + DB4 * DE2,
-+ DA5 + DB5 * DE2};
-+VECT_VAR_DECL(expectedfma3, float, 64, 2) [] = {DA6 + DB6 * DE3,
-+ DA7 + DB7 * DE3};
-+hfloat64_t * VECT_VAR (expectedfms0_static, hfloat, 64, 2) =
-+ (hfloat64_t *) VECT_VAR (expectedfms0, float, 64, 2);
-+hfloat64_t * VECT_VAR (expectedfms1_static, hfloat, 64, 2) =
-+ (hfloat64_t *) VECT_VAR (expectedfms1, float, 64, 2);
-+hfloat64_t * VECT_VAR (expectedfms2_static, hfloat, 64, 2) =
-+ (hfloat64_t *) VECT_VAR (expectedfms2, float, 64, 2);
-+hfloat64_t * VECT_VAR (expectedfms3_static, hfloat, 64, 2) =
-+ (hfloat64_t *) VECT_VAR (expectedfms3, float, 64, 2);
-+hfloat64_t * VECT_VAR (expectedfma0_static, hfloat, 64, 2) =
-+ (hfloat64_t *) VECT_VAR (expectedfma0, float, 64, 2);
-+hfloat64_t * VECT_VAR (expectedfma1_static, hfloat, 64, 2) =
-+ (hfloat64_t *) VECT_VAR (expectedfma1, float, 64, 2);
-+hfloat64_t * VECT_VAR (expectedfma2_static, hfloat, 64, 2) =
-+ (hfloat64_t *) VECT_VAR (expectedfma2, float, 64, 2);
-+hfloat64_t * VECT_VAR (expectedfma3_static, hfloat, 64, 2) =
-+ (hfloat64_t *) VECT_VAR (expectedfma3, float, 64, 2);
++ vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemC));
+
-+VECT_VAR_DECL(expectedfms0, float, 64, 1) [] = {DA0 + -DB0 * DE0};
-+VECT_VAR_DECL(expectedfms1, float, 64, 1) [] = {DA2 + -DB2 * DE1};
-+VECT_VAR_DECL(expectedfms2, float, 64, 1) [] = {DA4 + -DB4 * DE2};
-+VECT_VAR_DECL(expectedfms3, float, 64, 1) [] = {DA6 + -DB6 * DE3};
-+VECT_VAR_DECL(expectedfma0, float, 64, 1) [] = {DA0 + DB0 * DE0};
-+VECT_VAR_DECL(expectedfma1, float, 64, 1) [] = {DA2 + DB2 * DE1};
-+VECT_VAR_DECL(expectedfma2, float, 64, 1) [] = {DA4 + DB4 * DE2};
-+VECT_VAR_DECL(expectedfma3, float, 64, 1) [] = {DA6 + DB6 * DE3};
++ for (indx = 0; indx < 4; indx++)
++ if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_3[indx])
++ abort ();
+
-+hfloat64_t * VECT_VAR (expectedfms0_static, hfloat, 64, 1) =
-+ (hfloat64_t *) VECT_VAR (expectedfms0, float, 64, 1);
-+hfloat64_t * VECT_VAR (expectedfms1_static, hfloat, 64, 1) =
-+ (hfloat64_t *) VECT_VAR (expectedfms1, float, 64, 1);
-+hfloat64_t * VECT_VAR (expectedfms2_static, hfloat, 64, 1) =
-+ (hfloat64_t *) VECT_VAR (expectedfms2, float, 64, 1);
-+hfloat64_t * VECT_VAR (expectedfms3_static, hfloat, 64, 1) =
-+ (hfloat64_t *) VECT_VAR (expectedfms3, float, 64, 1);
-+hfloat64_t * VECT_VAR (expectedfma0_static, hfloat, 64, 1) =
-+ (hfloat64_t *) VECT_VAR (expectedfma0, float, 64, 1);
-+hfloat64_t * VECT_VAR (expectedfma1_static, hfloat, 64, 1) =
-+ (hfloat64_t *) VECT_VAR (expectedfma1, float, 64, 1);
-+hfloat64_t * VECT_VAR (expectedfma2_static, hfloat, 64, 1) =
-+ (hfloat64_t *) VECT_VAR (expectedfma2, float, 64, 1);
-+hfloat64_t * VECT_VAR (expectedfma3_static, hfloat, 64, 1) =
-+ (hfloat64_t *) VECT_VAR (expectedfma3, float, 64, 1);
++ vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemD));
+
-+void exec_vfma_vfms_n (void)
++ for (indx = 0; indx < 4; indx++)
++ if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_4[indx])
++ abort ();
++
++/* { dg-final { scan-assembler-times "fmul\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.s\\\[0\\\]" 4 } } */
++}
++
++void
++check_v2df (float64_t elemdC, float64_t elemdD)
+{
-+#undef TEST_MSG
-+#define TEST_MSG "VFMS_VFMA_N (FP32)"
-+ clean_results ();
++ int32_t indx;
++ const float64_t vec64x2_buf[2] = {AD, BD};
++ float64x2_t vec64x2_src = vld1q_f64 (vec64x2_buf);
++ float64_t vec64x2_res[2];
+
-+ DECL_VARIABLE(vsrc_1, float, 32, 2);
-+ DECL_VARIABLE(vsrc_2, float, 32, 2);
-+ VECT_VAR_DECL (buf_src_1, float, 32, 2) [] = {A0, A1};
-+ VECT_VAR_DECL (buf_src_2, float, 32, 2) [] = {B0, B1};
-+ VLOAD (vsrc_1, buf_src_1, , float, f, 32, 2);
-+ VLOAD (vsrc_2, buf_src_2, , float, f, 32, 2);
-+ DECL_VARIABLE (vector_res, float, 32, 2) =
-+ vfms_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
-+ VECT_VAR (vsrc_2, float, 32, 2), elem0);
-+ vst1_f32 (VECT_VAR (result, float, 32, 2),
-+ VECT_VAR (vector_res, float, 32, 2));
-+ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfms0_static, "");
-+ VECT_VAR (vector_res, float, 32, 2) =
-+ vfma_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
-+ VECT_VAR (vsrc_2, float, 32, 2), elem0);
-+ vst1_f32 (VECT_VAR (result, float, 32, 2),
-+ VECT_VAR (vector_res, float, 32, 2));
-+ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfma0_static, "");
++ vst1q_f64 (vec64x2_res, vmulq_n_f64 (vec64x2_src, elemdC));
+
-+ VECT_VAR_DECL (buf_src_3, float, 32, 2) [] = {A2, A3};
-+ VECT_VAR_DECL (buf_src_4, float, 32, 2) [] = {B2, B3};
-+ VLOAD (vsrc_1, buf_src_3, , float, f, 32, 2);
-+ VLOAD (vsrc_2, buf_src_4, , float, f, 32, 2);
-+ VECT_VAR (vector_res, float, 32, 2) =
-+ vfms_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
-+ VECT_VAR (vsrc_2, float, 32, 2), elem1);
-+ vst1_f32 (VECT_VAR (result, float, 32, 2),
-+ VECT_VAR (vector_res, float, 32, 2));
-+ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfms1_static, "");
-+ VECT_VAR (vector_res, float, 32, 2) =
-+ vfma_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
-+ VECT_VAR (vsrc_2, float, 32, 2), elem1);
-+ vst1_f32 (VECT_VAR (result, float, 32, 2),
-+ VECT_VAR (vector_res, float, 32, 2));
-+ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfma1_static, "");
++ for (indx = 0; indx < 2; indx++)
++ if (* (uint64_t *) &vec64x2_res[indx] != * (uint64_t *) &expectedd2_1[indx])
++ abort ();
+
-+ VECT_VAR_DECL (buf_src_5, float, 32, 2) [] = {A4, A5};
-+ VECT_VAR_DECL (buf_src_6, float, 32, 2) [] = {B4, B5};
-+ VLOAD (vsrc_1, buf_src_5, , float, f, 32, 2);
-+ VLOAD (vsrc_2, buf_src_6, , float, f, 32, 2);
-+ VECT_VAR (vector_res, float, 32, 2) =
-+ vfms_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
-+ VECT_VAR (vsrc_2, float, 32, 2), elem2);
-+ vst1_f32 (VECT_VAR (result, float, 32, 2),
-+ VECT_VAR (vector_res, float, 32, 2));
-+ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfms2_static, "");
-+ VECT_VAR (vector_res, float, 32, 2) =
-+ vfma_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
-+ VECT_VAR (vsrc_2, float, 32, 2), elem2);
-+ vst1_f32 (VECT_VAR (result, float, 32, 2),
-+ VECT_VAR (vector_res, float, 32, 2));
-+ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfma2_static, "");
++ vst1q_f64 (vec64x2_res, vmulq_n_f64 (vec64x2_src, elemdD));
+
-+ VECT_VAR_DECL (buf_src_7, float, 32, 2) [] = {A6, A7};
-+ VECT_VAR_DECL (buf_src_8, float, 32, 2) [] = {B6, B7};
-+ VLOAD (vsrc_1, buf_src_7, , float, f, 32, 2);
-+ VLOAD (vsrc_2, buf_src_8, , float, f, 32, 2);
-+ VECT_VAR (vector_res, float, 32, 2) =
-+ vfms_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
-+ VECT_VAR (vsrc_2, float, 32, 2), elem3);
-+ vst1_f32 (VECT_VAR (result, float, 32, 2),
-+ VECT_VAR (vector_res, float, 32, 2));
-+ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfms3_static, "");
-+ VECT_VAR (vector_res, float, 32, 2) =
-+ vfma_n_f32 (VECT_VAR (vsrc_1, float, 32, 2),
-+ VECT_VAR (vsrc_2, float, 32, 2), elem3);
-+ vst1_f32 (VECT_VAR (result, float, 32, 2),
-+ VECT_VAR (vector_res, float, 32, 2));
-+ CHECK_FP (TEST_MSG, float, 32, 2, PRIx16, expectedfma3_static, "");
++ for (indx = 0; indx < 2; indx++)
++ if (* (uint64_t *) &vec64x2_res[indx] != * (uint64_t *) &expectedd2_2[indx])
++ abort ();
++
++/* { dg-final { scan-assembler-times "fmul\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.d\\\[0\\\]" 2 } } */
++}
++
++void
++check_v2si (int32_t elemsA, int32_t elemsB)
++{
++ int32_t indx;
++ const int32_t vecs32x2_buf[2] = {AS, BS};
++ int32x2_t vecs32x2_src = vld1_s32 (vecs32x2_buf);
++ int32_t vecs32x2_res[2];
+
-+#undef TEST_MSG
-+#define TEST_MSG "VFMSQ_VFMAQ_N (FP32)"
-+ clean_results ();
++ vst1_s32 (vecs32x2_res, vmul_n_s32 (vecs32x2_src, elemsA));
+
-+ DECL_VARIABLE(vsrc_1, float, 32, 4);
-+ DECL_VARIABLE(vsrc_2, float, 32, 4);
-+ VECT_VAR_DECL (buf_src_1, float, 32, 4) [] = {A0, A1, A2, A3};
-+ VECT_VAR_DECL (buf_src_2, float, 32, 4) [] = {B0, B1, B2, B3};
-+ VLOAD (vsrc_1, buf_src_1, q, float, f, 32, 4);
-+ VLOAD (vsrc_2, buf_src_2, q, float, f, 32, 4);
-+ DECL_VARIABLE (vector_res, float, 32, 4) =
-+ vfmsq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
-+ VECT_VAR (vsrc_2, float, 32, 4), elem0);
-+ vst1q_f32 (VECT_VAR (result, float, 32, 4),
-+ VECT_VAR (vector_res, float, 32, 4));
-+ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfms0_static, "");
-+ VECT_VAR (vector_res, float, 32, 4) =
-+ vfmaq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
-+ VECT_VAR (vsrc_2, float, 32, 4), elem0);
-+ vst1q_f32 (VECT_VAR (result, float, 32, 4),
-+ VECT_VAR (vector_res, float, 32, 4));
-+ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfma0_static, "");
++ for (indx = 0; indx < 2; indx++)
++ if (vecs32x2_res[indx] != expecteds2_1[indx])
++ abort ();
+
-+ VECT_VAR_DECL (buf_src_3, float, 32, 4) [] = {A4, A5, A6, A7};
-+ VECT_VAR_DECL (buf_src_4, float, 32, 4) [] = {B4, B5, B6, B7};
-+ VLOAD (vsrc_1, buf_src_3, q, float, f, 32, 4);
-+ VLOAD (vsrc_2, buf_src_4, q, float, f, 32, 4);
-+ VECT_VAR (vector_res, float, 32, 4) =
-+ vfmsq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
-+ VECT_VAR (vsrc_2, float, 32, 4), elem1);
-+ vst1q_f32 (VECT_VAR (result, float, 32, 4),
-+ VECT_VAR (vector_res, float, 32, 4));
-+ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfms1_static, "");
-+ VECT_VAR (vector_res, float, 32, 4) =
-+ vfmaq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
-+ VECT_VAR (vsrc_2, float, 32, 4), elem1);
-+ vst1q_f32 (VECT_VAR (result, float, 32, 4),
-+ VECT_VAR (vector_res, float, 32, 4));
-+ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfma1_static, "");
++ vst1_s32 (vecs32x2_res, vmul_n_s32 (vecs32x2_src, elemsB));
+
-+ VECT_VAR_DECL (buf_src_5, float, 32, 4) [] = {A0, A2, A4, A6};
-+ VECT_VAR_DECL (buf_src_6, float, 32, 4) [] = {B0, B2, B4, B6};
-+ VLOAD (vsrc_1, buf_src_5, q, float, f, 32, 4);
-+ VLOAD (vsrc_2, buf_src_6, q, float, f, 32, 4);
-+ VECT_VAR (vector_res, float, 32, 4) =
-+ vfmsq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
-+ VECT_VAR (vsrc_2, float, 32, 4), elem2);
-+ vst1q_f32 (VECT_VAR (result, float, 32, 4),
-+ VECT_VAR (vector_res, float, 32, 4));
-+ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfms2_static, "");
-+ VECT_VAR (vector_res, float, 32, 4) =
-+ vfmaq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
-+ VECT_VAR (vsrc_2, float, 32, 4), elem2);
-+ vst1q_f32 (VECT_VAR (result, float, 32, 4),
-+ VECT_VAR (vector_res, float, 32, 4));
-+ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfma2_static, "");
++ for (indx = 0; indx < 2; indx++)
++ if (vecs32x2_res[indx] != expecteds2_2[indx])
++ abort ();
++}
+
-+ VECT_VAR_DECL (buf_src_7, float, 32, 4) [] = {A1, A3, A5, A7};
-+ VECT_VAR_DECL (buf_src_8, float, 32, 4) [] = {B1, B3, B5, B7};
-+ VLOAD (vsrc_1, buf_src_7, q, float, f, 32, 4);
-+ VLOAD (vsrc_2, buf_src_8, q, float, f, 32, 4);
-+ VECT_VAR (vector_res, float, 32, 4) =
-+ vfmsq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
-+ VECT_VAR (vsrc_2, float, 32, 4), elem3);
-+ vst1q_f32 (VECT_VAR (result, float, 32, 4),
-+ VECT_VAR (vector_res, float, 32, 4));
-+ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfms3_static, "");
-+ VECT_VAR (vector_res, float, 32, 4) =
-+ vfmaq_n_f32 (VECT_VAR (vsrc_1, float, 32, 4),
-+ VECT_VAR (vsrc_2, float, 32, 4), elem3);
-+ vst1q_f32 (VECT_VAR (result, float, 32, 4),
-+ VECT_VAR (vector_res, float, 32, 4));
-+ CHECK_FP (TEST_MSG, float, 32, 4, PRIx16, expectedfma3_static, "");
++void
++check_v2si_unsigned (uint32_t elemusA, uint32_t elemusB)
++{
++ int indx;
++ const uint32_t vecus32x2_buf[2] = {AUS, BUS};
++ uint32x2_t vecus32x2_src = vld1_u32 (vecus32x2_buf);
++ uint32_t vecus32x2_res[2];
+
-+#undef TEST_MSG
-+#define TEST_MSG "VFMSQ_VFMAQ_N (FP64)"
-+ clean_results ();
++ vst1_u32 (vecus32x2_res, vmul_n_u32 (vecus32x2_src, elemusA));
+
-+ DECL_VARIABLE(vsrc_1, float, 64, 2);
-+ DECL_VARIABLE(vsrc_2, float, 64, 2);
-+ VECT_VAR_DECL (buf_src_1, float, 64, 2) [] = {DA0, DA1};
-+ VECT_VAR_DECL (buf_src_2, float, 64, 2) [] = {DB0, DB1};
-+ VLOAD (vsrc_1, buf_src_1, q, float, f, 64, 2);
-+ VLOAD (vsrc_2, buf_src_2, q, float, f, 64, 2);
-+ DECL_VARIABLE (vector_res, float, 64, 2) =
-+ vfmsq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
-+ VECT_VAR (vsrc_2, float, 64, 2), delem0);
-+ vst1q_f64 (VECT_VAR (result, float, 64, 2),
-+ VECT_VAR (vector_res, float, 64, 2));
-+ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfms0_static, "");
-+ VECT_VAR (vector_res, float, 64, 2) =
-+ vfmaq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
-+ VECT_VAR (vsrc_2, float, 64, 2), delem0);
-+ vst1q_f64 (VECT_VAR (result, float, 64, 2),
-+ VECT_VAR (vector_res, float, 64, 2));
-+ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfma0_static, "");
++ for (indx = 0; indx < 2; indx++)
++ if (vecus32x2_res[indx] != expectedus2_1[indx])
++ abort ();
+
-+ VECT_VAR_DECL (buf_src_3, float, 64, 2) [] = {DA2, DA3};
-+ VECT_VAR_DECL (buf_src_4, float, 64, 2) [] = {DB2, DB3};
-+ VLOAD (vsrc_1, buf_src_3, q, float, f, 64, 2);
-+ VLOAD (vsrc_2, buf_src_4, q, float, f, 64, 2);
-+ VECT_VAR (vector_res, float, 64, 2) =
-+ vfmsq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
-+ VECT_VAR (vsrc_2, float, 64, 2), delem1);
-+ vst1q_f64 (VECT_VAR (result, float, 64, 2),
-+ VECT_VAR (vector_res, float, 64, 2));
-+ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfms1_static, "");
-+ VECT_VAR (vector_res, float, 64, 2) =
-+ vfmaq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
-+ VECT_VAR (vsrc_2, float, 64, 2), delem1);
-+ vst1q_f64 (VECT_VAR (result, float, 64, 2),
-+ VECT_VAR (vector_res, float, 64, 2));
-+ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfma1_static, "");
++ vst1_u32 (vecus32x2_res, vmul_n_u32 (vecus32x2_src, elemusB));
+
-+ VECT_VAR_DECL (buf_src_5, float, 64, 2) [] = {DA4, DA5};
-+ VECT_VAR_DECL (buf_src_6, float, 64, 2) [] = {DB4, DB5};
-+ VLOAD (vsrc_1, buf_src_5, q, float, f, 64, 2);
-+ VLOAD (vsrc_2, buf_src_6, q, float, f, 64, 2);
-+ VECT_VAR (vector_res, float, 64, 2) =
-+ vfmsq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
-+ VECT_VAR (vsrc_2, float, 64, 2), delem2);
-+ vst1q_f64 (VECT_VAR (result, float, 64, 2),
-+ VECT_VAR (vector_res, float, 64, 2));
-+ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfms2_static, "");
-+ VECT_VAR (vector_res, float, 64, 2) =
-+ vfmaq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
-+ VECT_VAR (vsrc_2, float, 64, 2), delem2);
-+ vst1q_f64 (VECT_VAR (result, float, 64, 2),
-+ VECT_VAR (vector_res, float, 64, 2));
-+ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfma2_static, "");
++ for (indx = 0; indx < 2; indx++)
++ if (vecus32x2_res[indx] != expectedus2_2[indx])
++ abort ();
+
-+ VECT_VAR_DECL (buf_src_7, float, 64, 2) [] = {DA6, DA7};
-+ VECT_VAR_DECL (buf_src_8, float, 64, 2) [] = {DB6, DB7};
-+ VLOAD (vsrc_1, buf_src_7, q, float, f, 64, 2);
-+ VLOAD (vsrc_2, buf_src_8, q, float, f, 64, 2);
-+ VECT_VAR (vector_res, float, 64, 2) =
-+ vfmsq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
-+ VECT_VAR (vsrc_2, float, 64, 2), delem3);
-+ vst1q_f64 (VECT_VAR (result, float, 64, 2),
-+ VECT_VAR (vector_res, float, 64, 2));
-+ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfms3_static, "");
-+ VECT_VAR (vector_res, float, 64, 2) =
-+ vfmaq_n_f64 (VECT_VAR (vsrc_1, float, 64, 2),
-+ VECT_VAR (vsrc_2, float, 64, 2), delem3);
-+ vst1q_f64 (VECT_VAR (result, float, 64, 2),
-+ VECT_VAR (vector_res, float, 64, 2));
-+ CHECK_FP (TEST_MSG, float, 64, 2, PRIx16, expectedfma3_static, "");
++/* { dg-final { scan-assembler-times "\tmul\tv\[0-9\]+\.2s, v\[0-9\]+\.2s, v\[0-9\]+\.s\\\[0\\\]" 4 } } */
++}
+
-+#undef TEST_MSG
-+#define TEST_MSG "VFMS_VFMA_N (FP64)"
-+ clean_results ();
++void
++check_v4si (int32_t elemsA, int32_t elemsB, int32_t elemsC, int32_t elemsD)
++{
++ int32_t indx;
++ const int32_t vecs32x4_buf[4] = {AS, BS, CS, DS};
++ int32x4_t vecs32x4_src = vld1q_s32 (vecs32x4_buf);
++ int32_t vecs32x4_res[4];
+
-+ DECL_VARIABLE(vsrc_1, float, 64, 1);
-+ DECL_VARIABLE(vsrc_2, float, 64, 1);
-+ VECT_VAR_DECL (buf_src_1, float, 64, 1) [] = {DA0};
-+ VECT_VAR_DECL (buf_src_2, float, 64, 1) [] = {DB0};
-+ VLOAD (vsrc_1, buf_src_1, , float, f, 64, 1);
-+ VLOAD (vsrc_2, buf_src_2, , float, f, 64, 1);
-+ DECL_VARIABLE (vector_res, float, 64, 1) =
-+ vfms_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
-+ VECT_VAR (vsrc_2, float, 64, 1), delem0);
-+ vst1_f64 (VECT_VAR (result, float, 64, 1),
-+ VECT_VAR (vector_res, float, 64, 1));
-+ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfms0_static, "");
-+ VECT_VAR (vector_res, float, 64, 1) =
-+ vfma_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
-+ VECT_VAR (vsrc_2, float, 64, 1), delem0);
-+ vst1_f64 (VECT_VAR (result, float, 64, 1),
-+ VECT_VAR (vector_res, float, 64, 1));
-+ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfma0_static, "");
++ vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsA));
+
-+ VECT_VAR_DECL (buf_src_3, float, 64, 1) [] = {DA2};
-+ VECT_VAR_DECL (buf_src_4, float, 64, 1) [] = {DB2};
-+ VLOAD (vsrc_1, buf_src_3, , float, f, 64, 1);
-+ VLOAD (vsrc_2, buf_src_4, , float, f, 64, 1);
-+ VECT_VAR (vector_res, float, 64, 1) =
-+ vfms_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
-+ VECT_VAR (vsrc_2, float, 64, 1), delem1);
-+ vst1_f64 (VECT_VAR (result, float, 64, 1),
-+ VECT_VAR (vector_res, float, 64, 1));
-+ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfms1_static, "");
-+ VECT_VAR (vector_res, float, 64, 1) =
-+ vfma_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
-+ VECT_VAR (vsrc_2, float, 64, 1), delem1);
-+ vst1_f64 (VECT_VAR (result, float, 64, 1),
-+ VECT_VAR (vector_res, float, 64, 1));
-+ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfma1_static, "");
++ for (indx = 0; indx < 4; indx++)
++ if (vecs32x4_res[indx] != expecteds4_1[indx])
++ abort ();
+
-+ VECT_VAR_DECL (buf_src_5, float, 64, 1) [] = {DA4};
-+ VECT_VAR_DECL (buf_src_6, float, 64, 1) [] = {DB4};
-+ VLOAD (vsrc_1, buf_src_5, , float, f, 64, 1);
-+ VLOAD (vsrc_2, buf_src_6, , float, f, 64, 1);
-+ VECT_VAR (vector_res, float, 64, 1) =
-+ vfms_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
-+ VECT_VAR (vsrc_2, float, 64, 1), delem2);
-+ vst1_f64 (VECT_VAR (result, float, 64, 1),
-+ VECT_VAR (vector_res, float, 64, 1));
-+ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfms2_static, "");
-+ VECT_VAR (vector_res, float, 64, 1) =
-+ vfma_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
-+ VECT_VAR (vsrc_2, float, 64, 1), delem2);
-+ vst1_f64 (VECT_VAR (result, float, 64, 1),
-+ VECT_VAR (vector_res, float, 64, 1));
-+ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfma2_static, "");
++ vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsB));
+
-+ VECT_VAR_DECL (buf_src_7, float, 64, 1) [] = {DA6};
-+ VECT_VAR_DECL (buf_src_8, float, 64, 1) [] = {DB6};
-+ VLOAD (vsrc_1, buf_src_7, , float, f, 64, 1);
-+ VLOAD (vsrc_2, buf_src_8, , float, f, 64, 1);
-+ VECT_VAR (vector_res, float, 64, 1) =
-+ vfms_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
-+ VECT_VAR (vsrc_2, float, 64, 1), delem3);
-+ vst1_f64 (VECT_VAR (result, float, 64, 1),
-+ VECT_VAR (vector_res, float, 64, 1));
-+ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfms3_static, "");
-+ VECT_VAR (vector_res, float, 64, 1) =
-+ vfma_n_f64 (VECT_VAR (vsrc_1, float, 64, 1),
-+ VECT_VAR (vsrc_2, float, 64, 1), delem3);
-+ vst1_f64 (VECT_VAR (result, float, 64, 1),
-+ VECT_VAR (vector_res, float, 64, 1));
-+ CHECK_FP (TEST_MSG, float, 64, 1, PRIx16, expectedfma3_static, "");
++ for (indx = 0; indx < 4; indx++)
++ if (vecs32x4_res[indx] != expecteds4_2[indx])
++ abort ();
++
++ vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsC));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vecs32x4_res[indx] != expecteds4_3[indx])
++ abort ();
++
++ vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsD));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vecs32x4_res[indx] != expecteds4_4[indx])
++ abort ();
+}
-+#endif
+
-+int
-+main (void)
++void
++check_v4si_unsigned (uint32_t elemusA, uint32_t elemusB, uint32_t elemusC,
++ uint32_t elemusD)
+{
-+#if defined(__aarch64__) && defined(__ARM_FEATURE_FMA)
-+ exec_vfma_vfms_n ();
-+#endif
-+ return 0;
-+}
---- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c
-@@ -13,6 +13,7 @@ uint32_t expected_u32 = 0xfffffff1;
- uint64_t expected_u64 = 0xfffffffffffffff0;
- poly8_t expected_p8 = 0xf6;
- poly16_t expected_p16 = 0xfff2;
-+hfloat16_t expected_f16 = 0xcb80;
- hfloat32_t expected_f32 = 0xc1700000;
-
- int8_t expectedq_s8 = 0xff;
-@@ -25,6 +26,7 @@ uint32_t expectedq_u32 = 0xfffffff2;
- uint64_t expectedq_u64 = 0xfffffffffffffff1;
- poly8_t expectedq_p8 = 0xfe;
- poly16_t expectedq_p16 = 0xfff6;
-+hfloat16_t expectedq_f16 = 0xca80;
- hfloat32_t expectedq_f32 = 0xc1500000;
-
- int error_found = 0;
-@@ -52,6 +54,10 @@ void exec_vget_lane (void)
- uint32_t var_int32;
- float32_t var_float32;
- } var_int32_float32;
-+ union {
-+ uint16_t var_int16;
-+ float16_t var_float16;
-+ } var_int16_float16;
-
- #define TEST_VGET_LANE_FP(Q, T1, T2, W, N, L) \
- VAR(var, T1, W) = vget##Q##_lane_##T2##W(VECT_VAR(vector, T1, W, N), L); \
-@@ -81,10 +87,17 @@ void exec_vget_lane (void)
- VAR_DECL(var, uint, 64);
- VAR_DECL(var, poly, 8);
- VAR_DECL(var, poly, 16);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ VAR_DECL(var, float, 16);
-+#endif
- VAR_DECL(var, float, 32);
-
- /* Initialize input values. */
- TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ VLOAD(vector, buffer, , float, f, 16, 4);
-+ VLOAD(vector, buffer, q, float, f, 16, 8);
-+#endif
- VLOAD(vector, buffer, , float, f, 32, 2);
- VLOAD(vector, buffer, q, float, f, 32, 4);
-
-@@ -99,6 +112,9 @@ void exec_vget_lane (void)
- TEST_VGET_LANE(, uint, u, 64, 1, 0);
- TEST_VGET_LANE(, poly, p, 8, 8, 6);
- TEST_VGET_LANE(, poly, p, 16, 4, 2);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ TEST_VGET_LANE_FP(, float, f, 16, 4, 1);
-+#endif
- TEST_VGET_LANE_FP(, float, f, 32, 2, 1);
-
- TEST_VGET_LANE(q, int, s, 8, 16, 15);
-@@ -111,6 +127,9 @@ void exec_vget_lane (void)
- TEST_VGET_LANE(q, uint, u, 64, 2, 1);
- TEST_VGET_LANE(q, poly, p, 8, 16, 14);
- TEST_VGET_LANE(q, poly, p, 16, 8, 6);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ TEST_VGET_LANE_FP(q, float, f, 16, 8, 3);
-+#endif
- TEST_VGET_LANE_FP(q, float, f, 32, 4, 3);
- }
-
---- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul.c
-@@ -37,10 +37,8 @@ VECT_VAR_DECL(expected,poly,8,16) [] = { 0x60, 0xca, 0x34, 0x9e,
- VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc4c73333, 0xc4bac000,
- 0xc4ae4ccd, 0xc4a1d999 };
-
--#ifndef INSN_NAME
- #define INSN_NAME vmul
- #define TEST_MSG "VMUL"
--#endif
-
- #define FNNAME1(NAME) exec_ ## NAME
- #define FNNAME(NAME) FNNAME1(NAME)
---- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
-@@ -21,6 +21,8 @@ VECT_VAR_DECL(expected_s8_8,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
- 0xf4, 0xf5, 0xf6, 0xf7 };
- VECT_VAR_DECL(expected_s8_9,int,8,8) [] = { 0xf0, 0xff, 0xf1, 0xff,
- 0xf2, 0xff, 0xf3, 0xff };
-+VECT_VAR_DECL(expected_s8_10,int,8,8) [] = { 0x00, 0xcc, 0x80, 0xcb,
-+ 0x00, 0xcb, 0x80, 0xca };
-
- /* Expected results for vreinterpret_s16_xx. */
- VECT_VAR_DECL(expected_s16_1,int,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
-@@ -32,6 +34,7 @@ VECT_VAR_DECL(expected_s16_6,int,16,4) [] = { 0xfff0, 0xffff, 0xfff1, 0xffff };
- VECT_VAR_DECL(expected_s16_7,int,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
- VECT_VAR_DECL(expected_s16_8,int,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
- VECT_VAR_DECL(expected_s16_9,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
-+VECT_VAR_DECL(expected_s16_10,int,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
-
- /* Expected results for vreinterpret_s32_xx. */
- VECT_VAR_DECL(expected_s32_1,int,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
-@@ -43,6 +46,7 @@ VECT_VAR_DECL(expected_s32_6,int,32,2) [] = { 0xfffffff0, 0xfffffff1 };
- VECT_VAR_DECL(expected_s32_7,int,32,2) [] = { 0xfffffff0, 0xffffffff };
- VECT_VAR_DECL(expected_s32_8,int,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
- VECT_VAR_DECL(expected_s32_9,int,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
-+VECT_VAR_DECL(expected_s32_10,int,32,2) [] = { 0xcb80cc00, 0xca80cb00 };
-
- /* Expected results for vreinterpret_s64_xx. */
- VECT_VAR_DECL(expected_s64_1,int,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
-@@ -54,6 +58,7 @@ VECT_VAR_DECL(expected_s64_6,int,64,1) [] = { 0xfffffff1fffffff0 };
- VECT_VAR_DECL(expected_s64_7,int,64,1) [] = { 0xfffffffffffffff0 };
- VECT_VAR_DECL(expected_s64_8,int,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
- VECT_VAR_DECL(expected_s64_9,int,64,1) [] = { 0xfff3fff2fff1fff0 };
-+VECT_VAR_DECL(expected_s64_10,int,64,1) [] = { 0xca80cb00cb80cc00 };
-
- /* Expected results for vreinterpret_u8_xx. */
- VECT_VAR_DECL(expected_u8_1,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
-@@ -74,6 +79,8 @@ VECT_VAR_DECL(expected_u8_8,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
- 0xf4, 0xf5, 0xf6, 0xf7 };
- VECT_VAR_DECL(expected_u8_9,uint,8,8) [] = { 0xf0, 0xff, 0xf1, 0xff,
- 0xf2, 0xff, 0xf3, 0xff };
-+VECT_VAR_DECL(expected_u8_10,uint,8,8) [] = { 0x00, 0xcc, 0x80, 0xcb,
-+ 0x00, 0xcb, 0x80, 0xca };
-
- /* Expected results for vreinterpret_u16_xx. */
- VECT_VAR_DECL(expected_u16_1,uint,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
-@@ -85,6 +92,7 @@ VECT_VAR_DECL(expected_u16_6,uint,16,4) [] = { 0xfff0, 0xffff, 0xfff1, 0xffff };
- VECT_VAR_DECL(expected_u16_7,uint,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
- VECT_VAR_DECL(expected_u16_8,uint,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
- VECT_VAR_DECL(expected_u16_9,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
-+VECT_VAR_DECL(expected_u16_10,uint,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
-
- /* Expected results for vreinterpret_u32_xx. */
- VECT_VAR_DECL(expected_u32_1,uint,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
-@@ -96,6 +104,7 @@ VECT_VAR_DECL(expected_u32_6,uint,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
- VECT_VAR_DECL(expected_u32_7,uint,32,2) [] = { 0xfffffff0, 0xffffffff };
- VECT_VAR_DECL(expected_u32_8,uint,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
- VECT_VAR_DECL(expected_u32_9,uint,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
-+VECT_VAR_DECL(expected_u32_10,uint,32,2) [] = { 0xcb80cc00, 0xca80cb00 };
-
- /* Expected results for vreinterpret_u64_xx. */
- VECT_VAR_DECL(expected_u64_1,uint,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
-@@ -107,6 +116,7 @@ VECT_VAR_DECL(expected_u64_6,uint,64,1) [] = { 0xfff3fff2fff1fff0 };
- VECT_VAR_DECL(expected_u64_7,uint,64,1) [] = { 0xfffffff1fffffff0 };
- VECT_VAR_DECL(expected_u64_8,uint,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
- VECT_VAR_DECL(expected_u64_9,uint,64,1) [] = { 0xfff3fff2fff1fff0 };
-+VECT_VAR_DECL(expected_u64_10,uint,64,1) [] = { 0xca80cb00cb80cc00 };
-
- /* Expected results for vreinterpret_p8_xx. */
- VECT_VAR_DECL(expected_p8_1,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
-@@ -127,6 +137,8 @@ VECT_VAR_DECL(expected_p8_8,poly,8,8) [] = { 0xf0, 0xff, 0xff, 0xff,
- 0xff, 0xff, 0xff, 0xff };
- VECT_VAR_DECL(expected_p8_9,poly,8,8) [] = { 0xf0, 0xff, 0xf1, 0xff,
- 0xf2, 0xff, 0xf3, 0xff };
-+VECT_VAR_DECL(expected_p8_10,poly,8,8) [] = { 0x00, 0xcc, 0x80, 0xcb,
-+ 0x00, 0xcb, 0x80, 0xca };
-
- /* Expected results for vreinterpret_p16_xx. */
- VECT_VAR_DECL(expected_p16_1,poly,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
-@@ -138,6 +150,7 @@ VECT_VAR_DECL(expected_p16_6,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
- VECT_VAR_DECL(expected_p16_7,poly,16,4) [] = { 0xfff0, 0xffff, 0xfff1, 0xffff };
- VECT_VAR_DECL(expected_p16_8,poly,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
- VECT_VAR_DECL(expected_p16_9,poly,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
-+VECT_VAR_DECL(expected_p16_10,poly,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
-
- /* Expected results for vreinterpretq_s8_xx. */
- VECT_VAR_DECL(expected_q_s8_1,int,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
-@@ -176,6 +189,10 @@ VECT_VAR_DECL(expected_q_s8_9,int,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
- 0xf2, 0xff, 0xf3, 0xff,
- 0xf4, 0xff, 0xf5, 0xff,
- 0xf6, 0xff, 0xf7, 0xff };
-+VECT_VAR_DECL(expected_q_s8_10,int,8,16) [] = { 0x00, 0xcc, 0x80, 0xcb,
-+ 0x00, 0xcb, 0x80, 0xca,
-+ 0x00, 0xca, 0x80, 0xc9,
-+ 0x00, 0xc9, 0x80, 0xc8 };
-
- /* Expected results for vreinterpretq_s16_xx. */
- VECT_VAR_DECL(expected_q_s16_1,int,16,8) [] = { 0xf1f0, 0xf3f2,
-@@ -214,6 +231,10 @@ VECT_VAR_DECL(expected_q_s16_9,int,16,8) [] = { 0xfff0, 0xfff1,
- 0xfff2, 0xfff3,
- 0xfff4, 0xfff5,
- 0xfff6, 0xfff7 };
-+VECT_VAR_DECL(expected_q_s16_10,int,16,8) [] = { 0xcc00, 0xcb80,
-+ 0xcb00, 0xca80,
-+ 0xca00, 0xc980,
-+ 0xc900, 0xc880 };
-
- /* Expected results for vreinterpretq_s32_xx. */
- VECT_VAR_DECL(expected_q_s32_1,int,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
-@@ -234,6 +255,8 @@ VECT_VAR_DECL(expected_q_s32_8,int,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
- 0xfbfaf9f8, 0xfffefdfc };
- VECT_VAR_DECL(expected_q_s32_9,int,32,4) [] = { 0xfff1fff0, 0xfff3fff2,
- 0xfff5fff4, 0xfff7fff6 };
-+VECT_VAR_DECL(expected_q_s32_10,int,32,4) [] = { 0xcb80cc00, 0xca80cb00,
-+ 0xc980ca00, 0xc880c900 };
-
- /* Expected results for vreinterpretq_s64_xx. */
- VECT_VAR_DECL(expected_q_s64_1,int,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
-@@ -254,6 +277,8 @@ VECT_VAR_DECL(expected_q_s64_8,int,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
- 0xfffefdfcfbfaf9f8 };
- VECT_VAR_DECL(expected_q_s64_9,int,64,2) [] = { 0xfff3fff2fff1fff0,
- 0xfff7fff6fff5fff4 };
-+VECT_VAR_DECL(expected_q_s64_10,int,64,2) [] = { 0xca80cb00cb80cc00,
-+ 0xc880c900c980ca00 };
-
- /* Expected results for vreinterpretq_u8_xx. */
- VECT_VAR_DECL(expected_q_u8_1,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
-@@ -292,6 +317,10 @@ VECT_VAR_DECL(expected_q_u8_9,uint,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
- 0xf2, 0xff, 0xf3, 0xff,
- 0xf4, 0xff, 0xf5, 0xff,
- 0xf6, 0xff, 0xf7, 0xff };
-+VECT_VAR_DECL(expected_q_u8_10,uint,8,16) [] = { 0x00, 0xcc, 0x80, 0xcb,
-+ 0x00, 0xcb, 0x80, 0xca,
-+ 0x00, 0xca, 0x80, 0xc9,
-+ 0x00, 0xc9, 0x80, 0xc8 };
-
- /* Expected results for vreinterpretq_u16_xx. */
- VECT_VAR_DECL(expected_q_u16_1,uint,16,8) [] = { 0xf1f0, 0xf3f2,
-@@ -330,6 +359,10 @@ VECT_VAR_DECL(expected_q_u16_9,uint,16,8) [] = { 0xfff0, 0xfff1,
- 0xfff2, 0xfff3,
- 0xfff4, 0xfff5,
- 0xfff6, 0xfff7 };
-+VECT_VAR_DECL(expected_q_u16_10,uint,16,8) [] = { 0xcc00, 0xcb80,
-+ 0xcb00, 0xca80,
-+ 0xca00, 0xc980,
-+ 0xc900, 0xc880 };
-
- /* Expected results for vreinterpretq_u32_xx. */
- VECT_VAR_DECL(expected_q_u32_1,uint,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
-@@ -350,6 +383,8 @@ VECT_VAR_DECL(expected_q_u32_8,uint,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
- 0xfbfaf9f8, 0xfffefdfc };
- VECT_VAR_DECL(expected_q_u32_9,uint,32,4) [] = { 0xfff1fff0, 0xfff3fff2,
- 0xfff5fff4, 0xfff7fff6 };
-+VECT_VAR_DECL(expected_q_u32_10,uint,32,4) [] = { 0xcb80cc00, 0xca80cb00,
-+ 0xc980ca00, 0xc880c900 };
-
- /* Expected results for vreinterpretq_u64_xx. */
- VECT_VAR_DECL(expected_q_u64_1,uint,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
-@@ -370,6 +405,92 @@ VECT_VAR_DECL(expected_q_u64_8,uint,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
- 0xfffefdfcfbfaf9f8 };
- VECT_VAR_DECL(expected_q_u64_9,uint,64,2) [] = { 0xfff3fff2fff1fff0,
- 0xfff7fff6fff5fff4 };
-+VECT_VAR_DECL(expected_q_u64_10,uint,64,2) [] = { 0xca80cb00cb80cc00,
-+ 0xc880c900c980ca00 };
++ int indx;
++ const uint32_t vecus32x4_buf[4] = {AUS, BUS, CUS, DUS};
++ uint32x4_t vecus32x4_src = vld1q_u32 (vecus32x4_buf);
++ uint32_t vecus32x4_res[4];
+
-+/* Expected results for vreinterpretq_p8_xx. */
-+VECT_VAR_DECL(expected_q_p8_1,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
-+ 0xf4, 0xf5, 0xf6, 0xf7,
-+ 0xf8, 0xf9, 0xfa, 0xfb,
-+ 0xfc, 0xfd, 0xfe, 0xff };
-+VECT_VAR_DECL(expected_q_p8_2,poly,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
-+ 0xf2, 0xff, 0xf3, 0xff,
-+ 0xf4, 0xff, 0xf5, 0xff,
-+ 0xf6, 0xff, 0xf7, 0xff };
-+VECT_VAR_DECL(expected_q_p8_3,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xf1, 0xff, 0xff, 0xff,
-+ 0xf2, 0xff, 0xff, 0xff,
-+ 0xf3, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(expected_q_p8_4,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff,
-+ 0xf1, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(expected_q_p8_5,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
-+ 0xf4, 0xf5, 0xf6, 0xf7,
-+ 0xf8, 0xf9, 0xfa, 0xfb,
-+ 0xfc, 0xfd, 0xfe, 0xff };
-+VECT_VAR_DECL(expected_q_p8_6,poly,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
-+ 0xf2, 0xff, 0xf3, 0xff,
-+ 0xf4, 0xff, 0xf5, 0xff,
-+ 0xf6, 0xff, 0xf7, 0xff };
-+VECT_VAR_DECL(expected_q_p8_7,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xf1, 0xff, 0xff, 0xff,
-+ 0xf2, 0xff, 0xff, 0xff,
-+ 0xf3, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(expected_q_p8_8,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff,
-+ 0xf1, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(expected_q_p8_9,poly,8,16) [] = { 0xf0, 0xff, 0xf1, 0xff,
-+ 0xf2, 0xff, 0xf3, 0xff,
-+ 0xf4, 0xff, 0xf5, 0xff,
-+ 0xf6, 0xff, 0xf7, 0xff };
-+VECT_VAR_DECL(expected_q_p8_10,poly,8,16) [] = { 0x00, 0xcc, 0x80, 0xcb,
-+ 0x00, 0xcb, 0x80, 0xca,
-+ 0x00, 0xca, 0x80, 0xc9,
-+ 0x00, 0xc9, 0x80, 0xc8 };
++ vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusA));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vecus32x4_res[indx] != expectedus4_1[indx])
++ abort ();
++
++ vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusB));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vecus32x4_res[indx] != expectedus4_2[indx])
++ abort ();
++
++ vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusC));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vecus32x4_res[indx] != expectedus4_3[indx])
++ abort ();
++
++ vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusD));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vecus32x4_res[indx] != expectedus4_4[indx])
++ abort ();
++
++/* { dg-final { scan-assembler-times "\tmul\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.s\\\[0\\\]" 8 } } */
++}
++
++
++void
++check_v4hi (int16_t elemhA, int16_t elemhB, int16_t elemhC, int16_t elemhD)
++{
++ int32_t indx;
++ const int16_t vech16x4_buf[4] = {AH, BH, CH, DH};
++ int16x4_t vech16x4_src = vld1_s16 (vech16x4_buf);
++ int16_t vech16x4_res[4];
++
++ vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhA));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vech16x4_res[indx] != expectedh4_1[indx])
++ abort ();
++
++ vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhB));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vech16x4_res[indx] != expectedh4_2[indx])
++ abort ();
++
++ vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhC));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vech16x4_res[indx] != expectedh4_3[indx])
++ abort ();
++
++ vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhD));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vech16x4_res[indx] != expectedh4_4[indx])
++ abort ();
++}
++
++void
++check_v4hi_unsigned (uint16_t elemuhA, uint16_t elemuhB, uint16_t elemuhC,
++ uint16_t elemuhD)
++{
++ int indx;
++ const uint16_t vecuh16x4_buf[4] = {AUH, BUH, CUH, DUH};
++ uint16x4_t vecuh16x4_src = vld1_u16 (vecuh16x4_buf);
++ uint16_t vecuh16x4_res[4];
++
++ vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhA));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vecuh16x4_res[indx] != expecteduh4_1[indx])
++ abort ();
++
++ vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhB));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vecuh16x4_res[indx] != expecteduh4_2[indx])
++ abort ();
++
++ vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhC));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vecuh16x4_res[indx] != expecteduh4_3[indx])
++ abort ();
++
++ vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhD));
++
++ for (indx = 0; indx < 4; indx++)
++ if (vecuh16x4_res[indx] != expecteduh4_4[indx])
++ abort ();
++
++/* { dg-final { scan-assembler-times "mul\tv\[0-9\]+\.4h, v\[0-9\]+\.4h, v\[0-9\]+\.h\\\[0\\\]" 8 } } */
++}
++
++void
++check_v8hi (int16_t elemhA, int16_t elemhB, int16_t elemhC, int16_t elemhD,
++ int16_t elemhE, int16_t elemhF, int16_t elemhG, int16_t elemhH)
++{
++ int32_t indx;
++ const int16_t vech16x8_buf[8] = {AH, BH, CH, DH, EH, FH, GH, HH};
++ int16x8_t vech16x8_src = vld1q_s16 (vech16x8_buf);
++ int16_t vech16x8_res[8];
++
++ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhA));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vech16x8_res[indx] != expectedh8_1[indx])
++ abort ();
++
++ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhB));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vech16x8_res[indx] != expectedh8_2[indx])
++ abort ();
++
++ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhC));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vech16x8_res[indx] != expectedh8_3[indx])
++ abort ();
++
++ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhD));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vech16x8_res[indx] != expectedh8_4[indx])
++ abort ();
++
++ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhE));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vech16x8_res[indx] != expectedh8_5[indx])
++ abort ();
++
++ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhF));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vech16x8_res[indx] != expectedh8_6[indx])
++ abort ();
++
++ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhG));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vech16x8_res[indx] != expectedh8_7[indx])
++ abort ();
++
++ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhH));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vech16x8_res[indx] != expectedh8_8[indx])
++ abort ();
++}
++
++void
++check_v8hi_unsigned (uint16_t elemuhA, uint16_t elemuhB, uint16_t elemuhC,
++ uint16_t elemuhD, uint16_t elemuhE, uint16_t elemuhF,
++ uint16_t elemuhG, uint16_t elemuhH)
++{
++ int indx;
++ const uint16_t vecuh16x8_buf[8] = {AUH, BUH, CUH, DUH, EUH, FUH, GUH, HUH};
++ uint16x8_t vecuh16x8_src = vld1q_u16 (vecuh16x8_buf);
++ uint16_t vecuh16x8_res[8];
++
++ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhA));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vecuh16x8_res[indx] != expecteduh8_1[indx])
++ abort ();
++
++ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhB));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vecuh16x8_res[indx] != expecteduh8_2[indx])
++ abort ();
++
++ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhC));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vecuh16x8_res[indx] != expecteduh8_3[indx])
++ abort ();
++
++ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhD));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vecuh16x8_res[indx] != expecteduh8_4[indx])
++ abort ();
++
++ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhE));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vecuh16x8_res[indx] != expecteduh8_5[indx])
++ abort ();
++
++ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhF));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vecuh16x8_res[indx] != expecteduh8_6[indx])
++ abort ();
++
++ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhG));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vecuh16x8_res[indx] != expecteduh8_7[indx])
++ abort ();
++
++ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhH));
++
++ for (indx = 0; indx < 8; indx++)
++ if (vecuh16x8_res[indx] != expecteduh8_8[indx])
++ abort ();
+
-+/* Expected results for vreinterpretq_p16_xx. */
-+VECT_VAR_DECL(expected_q_p16_1,poly,16,8) [] = { 0xf1f0, 0xf3f2,
-+ 0xf5f4, 0xf7f6,
-+ 0xf9f8, 0xfbfa,
-+ 0xfdfc, 0xfffe };
-+VECT_VAR_DECL(expected_q_p16_2,poly,16,8) [] = { 0xfff0, 0xfff1,
-+ 0xfff2, 0xfff3,
-+ 0xfff4, 0xfff5,
-+ 0xfff6, 0xfff7 };
-+VECT_VAR_DECL(expected_q_p16_3,poly,16,8) [] = { 0xfff0, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xfff2, 0xffff,
-+ 0xfff3, 0xffff };
-+VECT_VAR_DECL(expected_q_p16_4,poly,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
-+VECT_VAR_DECL(expected_q_p16_5,poly,16,8) [] = { 0xf1f0, 0xf3f2,
-+ 0xf5f4, 0xf7f6,
-+ 0xf9f8, 0xfbfa,
-+ 0xfdfc, 0xfffe };
-+VECT_VAR_DECL(expected_q_p16_6,poly,16,8) [] = { 0xfff0, 0xfff1,
-+ 0xfff2, 0xfff3,
-+ 0xfff4, 0xfff5,
-+ 0xfff6, 0xfff7 };
-+VECT_VAR_DECL(expected_q_p16_7,poly,16,8) [] = { 0xfff0, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xfff2, 0xffff,
-+ 0xfff3, 0xffff };
-+VECT_VAR_DECL(expected_q_p16_8,poly,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
-+VECT_VAR_DECL(expected_q_p16_9,poly,16,8) [] = { 0xf1f0, 0xf3f2,
-+ 0xf5f4, 0xf7f6,
-+ 0xf9f8, 0xfbfa,
-+ 0xfdfc, 0xfffe };
-+VECT_VAR_DECL(expected_q_p16_10,poly,16,8) [] = { 0xcc00, 0xcb80,
-+ 0xcb00, 0xca80,
-+ 0xca00, 0xc980,
-+ 0xc900, 0xc880 };
-
- /* Expected results for vreinterpret_f32_xx. */
- VECT_VAR_DECL(expected_f32_1,hfloat,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
-@@ -382,6 +503,7 @@ VECT_VAR_DECL(expected_f32_7,hfloat,32,2) [] = { 0xfffffff0, 0xfffffff1 };
- VECT_VAR_DECL(expected_f32_8,hfloat,32,2) [] = { 0xfffffff0, 0xffffffff };
- VECT_VAR_DECL(expected_f32_9,hfloat,32,2) [] = { 0xf3f2f1f0, 0xf7f6f5f4 };
- VECT_VAR_DECL(expected_f32_10,hfloat,32,2) [] = { 0xfff1fff0, 0xfff3fff2 };
-+VECT_VAR_DECL(expected_f32_11,hfloat,32,2) [] = { 0xcb80cc00, 0xca80cb00 };
-
- /* Expected results for vreinterpretq_f32_xx. */
- VECT_VAR_DECL(expected_q_f32_1,hfloat,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
-@@ -404,8 +526,10 @@ VECT_VAR_DECL(expected_q_f32_9,hfloat,32,4) [] = { 0xf3f2f1f0, 0xf7f6f5f4,
- 0xfbfaf9f8, 0xfffefdfc };
- VECT_VAR_DECL(expected_q_f32_10,hfloat,32,4) [] = { 0xfff1fff0, 0xfff3fff2,
- 0xfff5fff4, 0xfff7fff6 };
-+VECT_VAR_DECL(expected_q_f32_11,hfloat,32,4) [] = { 0xcb80cc00, 0xca80cb00,
-+ 0xc980ca00, 0xc880c900 };
-
--/* Expected results for vreinterpretq_xx_f32. */
-+/* Expected results for vreinterpret_xx_f32. */
- VECT_VAR_DECL(expected_xx_f32_1,int,8,8) [] = { 0x0, 0x0, 0x80, 0xc1,
- 0x0, 0x0, 0x70, 0xc1 };
- VECT_VAR_DECL(expected_xx_f32_2,int,16,4) [] = { 0x0, 0xc180, 0x0, 0xc170 };
-@@ -419,6 +543,7 @@ VECT_VAR_DECL(expected_xx_f32_8,uint,64,1) [] = { 0xc1700000c1800000 };
- VECT_VAR_DECL(expected_xx_f32_9,poly,8,8) [] = { 0x0, 0x0, 0x80, 0xc1,
- 0x0, 0x0, 0x70, 0xc1 };
- VECT_VAR_DECL(expected_xx_f32_10,poly,16,4) [] = { 0x0, 0xc180, 0x0, 0xc170 };
-+VECT_VAR_DECL(expected_xx_f32_11,hfloat,16,4) [] = { 0x0, 0xc180, 0x0, 0xc170 };
-
- /* Expected results for vreinterpretq_xx_f32. */
- VECT_VAR_DECL(expected_q_xx_f32_1,int,8,16) [] = { 0x0, 0x0, 0x80, 0xc1,
-@@ -447,6 +572,62 @@ VECT_VAR_DECL(expected_q_xx_f32_9,poly,8,16) [] = { 0x0, 0x0, 0x80, 0xc1,
- 0x0, 0x0, 0x50, 0xc1 };
- VECT_VAR_DECL(expected_q_xx_f32_10,poly,16,8) [] = { 0x0, 0xc180, 0x0, 0xc170,
- 0x0, 0xc160, 0x0, 0xc150 };
-+VECT_VAR_DECL(expected_q_xx_f32_11,hfloat,16,8) [] = { 0x0, 0xc180, 0x0, 0xc170,
-+ 0x0, 0xc160, 0x0, 0xc150 };
++/* { dg-final { scan-assembler-times "mul\tv\[0-9\]+\.8h, v\[0-9\]+\.8h, v\[0-9\]+\.h\\\[0\\\]" 16 } } */
++}
+
-+/* Expected results for vreinterpret_f16_xx. */
-+VECT_VAR_DECL(expected_f16_1,hfloat,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
-+VECT_VAR_DECL(expected_f16_2,hfloat,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
-+VECT_VAR_DECL(expected_f16_3,hfloat,16,4) [] = { 0xfff0, 0xffff, 0xfff1, 0xffff };
-+VECT_VAR_DECL(expected_f16_4,hfloat,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
-+VECT_VAR_DECL(expected_f16_5,hfloat,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
-+VECT_VAR_DECL(expected_f16_6,hfloat,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
-+VECT_VAR_DECL(expected_f16_7,hfloat,16,4) [] = { 0xfff0, 0xffff, 0xfff1, 0xffff };
-+VECT_VAR_DECL(expected_f16_8,hfloat,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
-+VECT_VAR_DECL(expected_f16_9,hfloat,16,4) [] = { 0xf1f0, 0xf3f2, 0xf5f4, 0xf7f6 };
-+VECT_VAR_DECL(expected_f16_10,hfloat,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
++int
++main (void)
++{
++ check_v2sf (_elemA, _elemB);
++ check_v4sf (_elemA, _elemB, _elemC, _elemD);
++ check_v2df (_elemdC, _elemdD);
++ check_v2si (_elemsA, _elemsB);
++ check_v4si (_elemsA, _elemsB, _elemsC, _elemsD);
++ check_v4hi (_elemhA, _elemhB, _elemhC, _elemhD);
++ check_v8hi (_elemhA, _elemhB, _elemhC, _elemhD,
++ _elemhE, _elemhF, _elemhG, _elemhH);
++ check_v2si_unsigned (_elemusA, _elemusB);
++ check_v4si_unsigned (_elemusA, _elemusB, _elemusC, _elemusD);
++ check_v4hi_unsigned (_elemuhA, _elemuhB, _elemuhC, _elemuhD);
++ check_v8hi_unsigned (_elemuhA, _elemuhB, _elemuhC, _elemuhD,
++ _elemuhE, _elemuhF, _elemuhG, _elemuhH);
+
-+/* Expected results for vreinterpretq_f16_xx. */
-+VECT_VAR_DECL(expected_q_f16_1,hfloat,16,8) [] = { 0xf1f0, 0xf3f2,
-+ 0xf5f4, 0xf7f6,
-+ 0xf9f8, 0xfbfa,
-+ 0xfdfc, 0xfffe };
-+VECT_VAR_DECL(expected_q_f16_2,hfloat,16,8) [] = { 0xfff0, 0xfff1,
-+ 0xfff2, 0xfff3,
-+ 0xfff4, 0xfff5,
-+ 0xfff6, 0xfff7 };
-+VECT_VAR_DECL(expected_q_f16_3,hfloat,16,8) [] = { 0xfff0, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xfff2, 0xffff,
-+ 0xfff3, 0xffff };
-+VECT_VAR_DECL(expected_q_f16_4,hfloat,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
-+VECT_VAR_DECL(expected_q_f16_5,hfloat,16,8) [] = { 0xf1f0, 0xf3f2,
-+ 0xf5f4, 0xf7f6,
-+ 0xf9f8, 0xfbfa,
-+ 0xfdfc, 0xfffe };
-+VECT_VAR_DECL(expected_q_f16_6,hfloat,16,8) [] = { 0xfff0, 0xfff1,
-+ 0xfff2, 0xfff3,
-+ 0xfff4, 0xfff5,
-+ 0xfff6, 0xfff7 };
-+VECT_VAR_DECL(expected_q_f16_7,hfloat,16,8) [] = { 0xfff0, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xfff2, 0xffff,
-+ 0xfff3, 0xffff };
-+VECT_VAR_DECL(expected_q_f16_8,hfloat,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
-+VECT_VAR_DECL(expected_q_f16_9,hfloat,16,8) [] = { 0xf1f0, 0xf3f2,
-+ 0xf5f4, 0xf7f6,
-+ 0xf9f8, 0xfbfa,
-+ 0xfdfc, 0xfffe };
-+VECT_VAR_DECL(expected_q_f16_10,hfloat,16,8) [] = { 0xfff0, 0xfff1,
-+ 0xfff2, 0xfff3,
-+ 0xfff4, 0xfff5,
-+ 0xfff6, 0xfff7 };
-
- #define TEST_MSG "VREINTERPRET/VREINTERPRETQ"
-
-@@ -484,7 +665,9 @@ void exec_vreinterpret (void)
-
- /* Initialize input "vector" from "buffer". */
- TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
-+ VLOAD(vector, buffer, , float, f, 16, 4);
- VLOAD(vector, buffer, , float, f, 32, 2);
-+ VLOAD(vector, buffer, q, float, f, 16, 8);
- VLOAD(vector, buffer, q, float, f, 32, 4);
-
- /* vreinterpret_s8_xx. */
-@@ -497,6 +680,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(, int, s, 8, 8, uint, u, 64, 1, expected_s8_7);
- TEST_VREINTERPRET(, int, s, 8, 8, poly, p, 8, 8, expected_s8_8);
- TEST_VREINTERPRET(, int, s, 8, 8, poly, p, 16, 4, expected_s8_9);
-+ TEST_VREINTERPRET(, int, s, 8, 8, float, f, 16, 4, expected_s8_10);
-
- /* vreinterpret_s16_xx. */
- TEST_VREINTERPRET(, int, s, 16, 4, int, s, 8, 8, expected_s16_1);
-@@ -508,6 +692,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(, int, s, 16, 4, uint, u, 64, 1, expected_s16_7);
- TEST_VREINTERPRET(, int, s, 16, 4, poly, p, 8, 8, expected_s16_8);
- TEST_VREINTERPRET(, int, s, 16, 4, poly, p, 16, 4, expected_s16_9);
-+ TEST_VREINTERPRET(, int, s, 16, 4, float, f, 16, 4, expected_s16_10);
-
- /* vreinterpret_s32_xx. */
- TEST_VREINTERPRET(, int, s, 32, 2, int, s, 8, 8, expected_s32_1);
-@@ -519,6 +704,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(, int, s, 32, 2, uint, u, 64, 1, expected_s32_7);
- TEST_VREINTERPRET(, int, s, 32, 2, poly, p, 8, 8, expected_s32_8);
- TEST_VREINTERPRET(, int, s, 32, 2, poly, p, 16, 4, expected_s32_9);
-+ TEST_VREINTERPRET(, int, s, 32, 2, float, f, 16, 4, expected_s32_10);
++ return 0;
++}
++
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/struct_return.c
+@@ -0,0 +1,31 @@
++/* Test the absence of a spurious move from x8 to x0 for functions
++ return structures. */
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++
++struct s
++{
++ long x;
++ long y;
++ long z;
++};
++
++struct s __attribute__((noinline))
++foo (long a, long d, long c)
++{
++ struct s b;
++ b.x = a;
++ b.y = d;
++ b.z = c;
++ return b;
++}
++
++int
++main (void)
++{
++ struct s x;
++ x = foo ( 10, 20, 30);
++ return x.x + x.y + x.z;
++}
++
++/* { dg-final { scan-assembler-not "mov\tx0, x8" } } */
+--- a/src/gcc/testsuite/gcc.target/aarch64/test_frame_10.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/test_frame_10.c
+@@ -4,8 +4,7 @@
+ * total frame size > 512.
+ area except outgoing <= 512
+ * number of callee-saved reg >= 2.
+- * Split stack adjustment into two subtractions.
+- the first subtractions could be optimized into "stp !". */
++ * Use a single stack adjustment, no writeback. */
- /* vreinterpret_s64_xx. */
- TEST_VREINTERPRET(, int, s, 64, 1, int, s, 8, 8, expected_s64_1);
-@@ -530,6 +716,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(, int, s, 64, 1, uint, u, 64, 1, expected_s64_7);
- TEST_VREINTERPRET(, int, s, 64, 1, poly, p, 8, 8, expected_s64_8);
- TEST_VREINTERPRET(, int, s, 64, 1, poly, p, 16, 4, expected_s64_9);
-+ TEST_VREINTERPRET(, int, s, 64, 1, float, f, 16, 4, expected_s64_10);
+ /* { dg-do run } */
+ /* { dg-options "-O2 -fomit-frame-pointer --save-temps" } */
+@@ -15,6 +14,6 @@
+ t_frame_pattern_outgoing (test10, 480, "x19", 24, a[8], a[9], a[10])
+ t_frame_run (test10)
- /* vreinterpret_u8_xx. */
- TEST_VREINTERPRET(, uint, u, 8, 8, int, s, 8, 8, expected_u8_1);
-@@ -541,6 +728,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(, uint, u, 8, 8, uint, u, 64, 1, expected_u8_7);
- TEST_VREINTERPRET(, uint, u, 8, 8, poly, p, 8, 8, expected_u8_8);
- TEST_VREINTERPRET(, uint, u, 8, 8, poly, p, 16, 4, expected_u8_9);
-+ TEST_VREINTERPRET(, uint, u, 8, 8, float, f, 16, 4, expected_u8_10);
+-/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]!" 1 } } */
+-/* { dg-final { scan-assembler-times "ldp\tx19, x30, \\\[sp\\\], \[0-9\]+" 1 } } */
++/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, \[0-9\]+\\\]" 1 } } */
++/* { dg-final { scan-assembler-times "ldp\tx19, x30, \\\[sp, \[0-9\]+\\\]" 1 } } */
- /* vreinterpret_u16_xx. */
- TEST_VREINTERPRET(, uint, u, 16, 4, int, s, 8, 8, expected_u16_1);
-@@ -552,6 +740,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(, uint, u, 16, 4, uint, u, 64, 1, expected_u16_7);
- TEST_VREINTERPRET(, uint, u, 16, 4, poly, p, 8, 8, expected_u16_8);
- TEST_VREINTERPRET(, uint, u, 16, 4, poly, p, 16, 4, expected_u16_9);
-+ TEST_VREINTERPRET(, uint, u, 16, 4, float, f, 16, 4, expected_u16_10);
+--- a/src/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
+@@ -13,6 +13,6 @@ t_frame_run (test12)
- /* vreinterpret_u32_xx. */
- TEST_VREINTERPRET(, uint, u, 32, 2, int, s, 8, 8, expected_u32_1);
-@@ -563,6 +752,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(, uint, u, 32, 2, uint, u, 64, 1, expected_u32_7);
- TEST_VREINTERPRET(, uint, u, 32, 2, poly, p, 8, 8, expected_u32_8);
- TEST_VREINTERPRET(, uint, u, 32, 2, poly, p, 16, 4, expected_u32_9);
-+ TEST_VREINTERPRET(, uint, u, 32, 2, float, f, 16, 4, expected_u32_10);
+ /* { dg-final { scan-assembler-times "sub\tsp, sp, #\[0-9\]+" 1 } } */
- /* vreinterpret_u64_xx. */
- TEST_VREINTERPRET(, uint, u, 64, 1, int, s, 8, 8, expected_u64_1);
-@@ -574,6 +764,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(, uint, u, 64, 1, uint, u, 32, 2, expected_u64_7);
- TEST_VREINTERPRET(, uint, u, 64, 1, poly, p, 8, 8, expected_u64_8);
- TEST_VREINTERPRET(, uint, u, 64, 1, poly, p, 16, 4, expected_u64_9);
-+ TEST_VREINTERPRET(, uint, u, 64, 1, float, f, 16, 4, expected_u64_10);
+-/* Check epilogue using write-back. */
+-/* { dg-final { scan-assembler-times "ldp\tx29, x30, \\\[sp\\\], \[0-9\]+" 3 } } */
++/* Check epilogue using no write-back. */
++/* { dg-final { scan-assembler-times "ldp\tx29, x30, \\\[sp, \[0-9\]+\\\]" 1 } } */
- /* vreinterpret_p8_xx. */
- TEST_VREINTERPRET_POLY(, poly, p, 8, 8, int, s, 8, 8, expected_p8_1);
-@@ -585,6 +776,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET_POLY(, poly, p, 8, 8, uint, u, 32, 2, expected_p8_7);
- TEST_VREINTERPRET_POLY(, poly, p, 8, 8, uint, u, 64, 1, expected_p8_8);
- TEST_VREINTERPRET_POLY(, poly, p, 8, 8, poly, p, 16, 4, expected_p8_9);
-+ TEST_VREINTERPRET_POLY(, poly, p, 8, 8, float, f, 16, 4, expected_p8_10);
+--- a/src/gcc/testsuite/gcc.target/aarch64/test_frame_13.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/test_frame_13.c
+@@ -2,8 +2,7 @@
+ * without outgoing.
+ * total frame size > 512.
+ * number of callee-save reg >= 2.
+- * split the stack adjustment into two substractions,
+- the second could be optimized into "stp !". */
++ * Use a single stack adjustment, no writeback. */
- /* vreinterpret_p16_xx. */
- TEST_VREINTERPRET_POLY(, poly, p, 16, 4, int, s, 8, 8, expected_p16_1);
-@@ -596,6 +788,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET_POLY(, poly, p, 16, 4, uint, u, 32, 2, expected_p16_7);
- TEST_VREINTERPRET_POLY(, poly, p, 16, 4, uint, u, 64, 1, expected_p16_8);
- TEST_VREINTERPRET_POLY(, poly, p, 16, 4, poly, p, 8, 8, expected_p16_9);
-+ TEST_VREINTERPRET_POLY(, poly, p, 16, 4, float, f, 16, 4, expected_p16_10);
+ /* { dg-do run } */
+ /* { dg-options "-O2 --save-temps" } */
+@@ -14,4 +13,4 @@ t_frame_pattern (test13, 700, )
+ t_frame_run (test13)
- /* vreinterpretq_s8_xx. */
- TEST_VREINTERPRET(q, int, s, 8, 16, int, s, 16, 8, expected_q_s8_1);
-@@ -607,6 +800,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(q, int, s, 8, 16, uint, u, 64, 2, expected_q_s8_7);
- TEST_VREINTERPRET(q, int, s, 8, 16, poly, p, 8, 16, expected_q_s8_8);
- TEST_VREINTERPRET(q, int, s, 8, 16, poly, p, 16, 8, expected_q_s8_9);
-+ TEST_VREINTERPRET(q, int, s, 8, 16, float, f, 16, 8, expected_q_s8_10);
+ /* { dg-final { scan-assembler-times "sub\tsp, sp, #\[0-9\]+" 1 } } */
+-/* { dg-final { scan-assembler-times "stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]!" 2 } } */
++/* { dg-final { scan-assembler-times "stp\tx29, x30, \\\[sp\\\]" 1 } } */
+--- a/src/gcc/testsuite/gcc.target/aarch64/test_frame_15.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/test_frame_15.c
+@@ -3,8 +3,7 @@
+ * total frame size > 512.
+ area except outgoing <= 512
+ * number of callee-save reg >= 2.
+- * split the stack adjustment into two substractions,
+- the first could be optimized into "stp !". */
++ * Use a single stack adjustment, no writeback. */
- /* vreinterpretq_s16_xx. */
- TEST_VREINTERPRET(q, int, s, 16, 8, int, s, 8, 16, expected_q_s16_1);
-@@ -618,6 +812,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(q, int, s, 16, 8, uint, u, 64, 2, expected_q_s16_7);
- TEST_VREINTERPRET(q, int, s, 16, 8, poly, p, 8, 16, expected_q_s16_8);
- TEST_VREINTERPRET(q, int, s, 16, 8, poly, p, 16, 8, expected_q_s16_9);
-+ TEST_VREINTERPRET(q, int, s, 16, 8, float, f, 16, 8, expected_q_s16_10);
+ /* { dg-do run } */
+ /* { dg-options "-O2 --save-temps" } */
+@@ -15,4 +14,4 @@ t_frame_pattern_outgoing (test15, 480, , 8, a[8])
+ t_frame_run (test15)
- /* vreinterpretq_s32_xx. */
- TEST_VREINTERPRET(q, int, s, 32, 4, int, s, 8, 16, expected_q_s32_1);
-@@ -629,6 +824,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(q, int, s, 32, 4, uint, u, 64, 2, expected_q_s32_7);
- TEST_VREINTERPRET(q, int, s, 32, 4, poly, p, 8, 16, expected_q_s32_8);
- TEST_VREINTERPRET(q, int, s, 32, 4, poly, p, 16, 8, expected_q_s32_9);
-+ TEST_VREINTERPRET(q, int, s, 32, 4, float, f, 16, 8, expected_q_s32_10);
+ /* { dg-final { scan-assembler-times "sub\tsp, sp, #\[0-9\]+" 1 } } */
+-/* { dg-final { scan-assembler-times "stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]!" 3 } } */
++/* { dg-final { scan-assembler-times "stp\tx29, x30, \\\[sp, \[0-9\]+\\\]" 1 } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/test_frame_16.c
+@@ -0,0 +1,25 @@
++/* Verify:
++ * with outgoing.
++ * single int register push.
++ * varargs and callee-save size >= 256
++ * Use 2 stack adjustments. */
++
++/* { dg-do compile } */
++/* { dg-options "-O2 -fomit-frame-pointer --save-temps" } */
++
++#define REP8(X) X,X,X,X,X,X,X,X
++#define REP64(X) REP8(REP8(X))
++
++void outgoing (__builtin_va_list, ...);
++
++double vararg_outgoing (int x1, ...)
++{
++ double a1 = x1, a2 = x1 * 2, a3 = x1 * 3, a4 = x1 * 4, a5 = x1 * 5, a6 = x1 * 6;
++ __builtin_va_list vl;
++ __builtin_va_start (vl, x1);
++ outgoing (vl, a1, a2, a3, a4, a5, a6, REP64 (1));
++ __builtin_va_end (vl);
++ return a1 + a2 + a3 + a4 + a5 + a6;
++}
++
++/* { dg-final { scan-assembler-times "sub\tsp, sp, #\[0-9\]+" 2 } } */
+--- a/src/gcc/testsuite/gcc.target/aarch64/test_frame_6.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/test_frame_6.c
+@@ -3,8 +3,7 @@
+ * without outgoing.
+ * total frame size > 512.
+ * number of callee-saved reg == 1.
+- * split stack adjustment into two subtractions.
+- the second subtraction should use "str !". */
++ * use a single stack adjustment, no writeback. */
- /* vreinterpretq_s64_xx. */
- TEST_VREINTERPRET(q, int, s, 64, 2, int, s, 8, 16, expected_q_s64_1);
-@@ -640,6 +836,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(q, int, s, 64, 2, uint, u, 64, 2, expected_q_s64_7);
- TEST_VREINTERPRET(q, int, s, 64, 2, poly, p, 8, 16, expected_q_s64_8);
- TEST_VREINTERPRET(q, int, s, 64, 2, poly, p, 16, 8, expected_q_s64_9);
-+ TEST_VREINTERPRET(q, int, s, 64, 2, float, f, 16, 8, expected_q_s64_10);
+ /* { dg-do run } */
+ /* { dg-options "-O2 -fomit-frame-pointer --save-temps" } */
+@@ -14,6 +13,7 @@
+ t_frame_pattern (test6, 700, )
+ t_frame_run (test6)
- /* vreinterpretq_u8_xx. */
- TEST_VREINTERPRET(q, uint, u, 8, 16, int, s, 8, 16, expected_q_u8_1);
-@@ -651,6 +848,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(q, uint, u, 8, 16, uint, u, 64, 2, expected_q_u8_7);
- TEST_VREINTERPRET(q, uint, u, 8, 16, poly, p, 8, 16, expected_q_u8_8);
- TEST_VREINTERPRET(q, uint, u, 8, 16, poly, p, 16, 8, expected_q_u8_9);
-+ TEST_VREINTERPRET(q, uint, u, 8, 16, float, f, 16, 8, expected_q_u8_10);
+-/* { dg-final { scan-assembler-times "str\tx30, \\\[sp, -\[0-9\]+\\\]!" 2 } } */
+-/* { dg-final { scan-assembler-times "ldr\tx30, \\\[sp\\\], \[0-9\]+" 2 } } */
++/* { dg-final { scan-assembler-times "str\tx30, \\\[sp\\\]" 1 } } */
++/* { dg-final { scan-assembler-times "ldr\tx30, \\\[sp\\\]" 2 } } */
++/* { dg-final { scan-assembler-times "ldr\tx30, \\\[sp\\\]," 1 } } */
- /* vreinterpretq_u16_xx. */
- TEST_VREINTERPRET(q, uint, u, 16, 8, int, s, 8, 16, expected_q_u16_1);
-@@ -662,6 +860,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(q, uint, u, 16, 8, uint, u, 64, 2, expected_q_u16_7);
- TEST_VREINTERPRET(q, uint, u, 16, 8, poly, p, 8, 16, expected_q_u16_8);
- TEST_VREINTERPRET(q, uint, u, 16, 8, poly, p, 16, 8, expected_q_u16_9);
-+ TEST_VREINTERPRET(q, uint, u, 16, 8, float, f, 16, 8, expected_q_u16_10);
+--- a/src/gcc/testsuite/gcc.target/aarch64/test_frame_7.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/test_frame_7.c
+@@ -3,8 +3,7 @@
+ * without outgoing.
+ * total frame size > 512.
+ * number of callee-saved reg == 2.
+- * split stack adjustment into two subtractions.
+- the second subtraction should use "stp !". */
++ * use a single stack adjustment, no writeback. */
- /* vreinterpretq_u32_xx. */
- TEST_VREINTERPRET(q, uint, u, 32, 4, int, s, 8, 16, expected_q_u32_1);
-@@ -673,6 +872,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(q, uint, u, 32, 4, uint, u, 64, 2, expected_q_u32_7);
- TEST_VREINTERPRET(q, uint, u, 32, 4, poly, p, 8, 16, expected_q_u32_8);
- TEST_VREINTERPRET(q, uint, u, 32, 4, poly, p, 16, 8, expected_q_u32_9);
-+ TEST_VREINTERPRET(q, uint, u, 32, 4, float, f, 16, 8, expected_q_u32_10);
+ /* { dg-do run } */
+ /* { dg-options "-O2 -fomit-frame-pointer --save-temps" } */
+@@ -14,6 +13,6 @@
+ t_frame_pattern (test7, 700, "x19")
+ t_frame_run (test7)
- /* vreinterpretq_u64_xx. */
- TEST_VREINTERPRET(q, uint, u, 64, 2, int, s, 8, 16, expected_q_u64_1);
-@@ -684,6 +884,31 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(q, uint, u, 64, 2, uint, u, 32, 4, expected_q_u64_7);
- TEST_VREINTERPRET(q, uint, u, 64, 2, poly, p, 8, 16, expected_q_u64_8);
- TEST_VREINTERPRET(q, uint, u, 64, 2, poly, p, 16, 8, expected_q_u64_9);
-+ TEST_VREINTERPRET(q, uint, u, 64, 2, float, f, 16, 8, expected_q_u64_10);
+-/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]!" 1 } } */
+-/* { dg-final { scan-assembler-times "ldp\tx19, x30, \\\[sp\\\], \[0-9\]+" 1 } } */
++/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp]" 1 } } */
++/* { dg-final { scan-assembler-times "ldp\tx19, x30, \\\[sp\\\]" 1 } } */
+
+--- a/src/gcc/testsuite/gcc.target/aarch64/test_frame_8.c
++++ b/src/gcc/testsuite/gcc.target/aarch64/test_frame_8.c
+@@ -12,6 +12,6 @@
+ t_frame_pattern_outgoing (test8, 700, , 8, a[8])
+ t_frame_run (test8)
+
+-/* { dg-final { scan-assembler-times "str\tx30, \\\[sp, -\[0-9\]+\\\]!" 3 } } */
+-/* { dg-final { scan-assembler-times "ldr\tx30, \\\[sp\\\], \[0-9\]+" 3 } } */
++/* { dg-final { scan-assembler-times "str\tx30, \\\[sp, \[0-9\]+\\\]" 1 } } */
++/* { dg-final { scan-assembler-times "ldr\tx30, \\\[sp, \[0-9\]+\\\]" 1 } } */
+
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/thunderxloadpair.c
+@@ -0,0 +1,20 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 -mcpu=thunderx" } */
++
++struct ldp
++{
++ long long c;
++ int a, b;
++};
++
++
++int f(struct ldp *a)
++{
++ return a->a + a->b;
++}
++
++
++/* We know the alignement of a->a to be 8 byte aligned so it is profitable
++ to do ldp. */
++/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 1 } } */
++
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/thunderxnoloadpair.c
+@@ -0,0 +1,17 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 -mcpu=thunderx" } */
++
++struct noldp
++{
++ int a, b;
++};
++
++
++int f(struct noldp *a)
++{
++ return a->a + a->b;
++}
++
++/* We know the alignement of a->a to be 4 byte aligned so it is not profitable
++ to do ldp. */
++/* { dg-final { scan-assembler-not "ldp\tw\[0-9\]+, w\[0-9\]" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/va_arg_1.c
+@@ -0,0 +1,11 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 --save-temps" } */
++
++int
++f (int a, ...)
++{
++ /* { dg-final { scan-assembler-not "str" } } */
++ return a;
++}
++
++/* { dg-final { cleanup-saved-temps } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/va_arg_2.c
+@@ -0,0 +1,18 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 --save-temps" } */
++
++int
++foo (char *fmt, ...)
++{
++ int d;
++ __builtin_va_list ap;
++
++ __builtin_va_start (ap, fmt);
++ d = __builtin_va_arg (ap, int);
++ __builtin_va_end (ap);
++
++ /* { dg-final { scan-assembler-not "x7" } } */
++ return d;
++}
++
++/* { dg-final { cleanup-saved-temps } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/va_arg_3.c
+@@ -0,0 +1,26 @@
++/* { dg-do compile } */
++/* { dg-options "-O2 --save-temps" } */
++
++int d2i (double a);
++
++int
++foo (char *fmt, ...)
++{
++ int d, e;
++ double f, g;
++ __builtin_va_list ap;
++
++ __builtin_va_start (ap, fmt);
++ d = __builtin_va_arg (ap, int);
++ f = __builtin_va_arg (ap, double);
++ g = __builtin_va_arg (ap, double);
++ d += d2i (f);
++ d += d2i (g);
++ __builtin_va_end (ap);
++
++ /* { dg-final { scan-assembler-not "x7" } } */
++ /* { dg-final { scan-assembler-not "q7" } } */
++ return d;
++}
++
++/* { dg-final { cleanup-saved-temps } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
+@@ -0,0 +1,86 @@
++/* { dg-do compile } */
++/* { dg-options "-O3" } */
++
++#include "arm_neon.h"
++
++#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \
++TYPE1 __attribute__((noinline,noclone)) \
++test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b) \
++{ \
++ return vcopy##Q1##_lane##Q2##_##SUFFIX (a, INDEX1, b, INDEX2); \
++}
++
++/* vcopy_lane. */
++BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6)
++BUILD_TEST (int8x8_t, int8x8_t, , , s8, 7, 6)
++BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6)
++/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[6\\\]" 3 } } */
++BUILD_TEST (poly16x4_t, poly16x4_t, , , p16, 3, 2)
++BUILD_TEST (int16x4_t, int16x4_t, , , s16, 3, 2)
++BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
++/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[2\\\]" 3 } } */
++BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
++BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0)
++BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0)
++/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */
++BUILD_TEST (int64x1_t, int64x1_t, , , s64, 0, 0)
++BUILD_TEST (uint64x1_t, uint64x1_t, , , u64, 0, 0)
++BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0)
++/* { dg-final { scan-assembler-times "fmov\\td0, d1" 3 } } */
++
++/* vcopy_laneq. */
++
++BUILD_TEST (poly8x8_t, poly8x16_t, , q, p8, 7, 15)
++BUILD_TEST (int8x8_t, int8x16_t, , q, s8, 7, 15)
++BUILD_TEST (uint8x8_t, uint8x16_t, , q, u8, 7, 15)
++/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[15\\\]" 3 } } */
++BUILD_TEST (poly16x4_t, poly16x8_t, , q, p16, 3, 7)
++BUILD_TEST (int16x4_t, int16x8_t, , q, s16, 3, 7)
++BUILD_TEST (uint16x4_t, uint16x8_t, , q, u16, 3, 7)
++/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[7\\\]" 3 } } */
++BUILD_TEST (float32x2_t, float32x4_t, , q, f32, 1, 3)
++BUILD_TEST (int32x2_t, int32x4_t, , q, s32, 1, 3)
++BUILD_TEST (uint32x2_t, uint32x4_t, , q, u32, 1, 3)
++/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[3\\\]" 3 } } */
++BUILD_TEST (float64x1_t, float64x2_t, , q, f64, 0, 1)
++BUILD_TEST (int64x1_t, int64x2_t, , q, s64, 0, 1)
++BUILD_TEST (uint64x1_t, uint64x2_t, , q, u64, 0, 1)
++/* XFAIL due to PR 71307. */
++/* { dg-final { scan-assembler-times "dup\\td0, v1.d\\\[1\\\]" 3 { xfail *-*-* } } } */
++
++/* vcopyq_lane. */
++BUILD_TEST (poly8x16_t, poly8x8_t, q, , p8, 15, 7)
++BUILD_TEST (int8x16_t, int8x8_t, q, , s8, 15, 7)
++BUILD_TEST (uint8x16_t, uint8x8_t, q, , u8, 15, 7)
++/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[15\\\], v1.b\\\[7\\\]" 3 } } */
++BUILD_TEST (poly16x8_t, poly16x4_t, q, , p16, 7, 3)
++BUILD_TEST (int16x8_t, int16x4_t, q, , s16, 7, 3)
++BUILD_TEST (uint16x8_t, uint16x4_t, q, , u16, 7, 3)
++/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[7\\\], v1.h\\\[3\\\]" 3 } } */
++BUILD_TEST (float32x4_t, float32x2_t, q, , f32, 3, 1)
++BUILD_TEST (int32x4_t, int32x2_t, q, , s32, 3, 1)
++BUILD_TEST (uint32x4_t, uint32x2_t, q, , u32, 3, 1)
++/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[3\\\], v1.s\\\[1\\\]" 3 } } */
++BUILD_TEST (float64x2_t, float64x1_t, q, , f64, 1, 0)
++BUILD_TEST (int64x2_t, int64x1_t, q, , s64, 1, 0)
++BUILD_TEST (uint64x2_t, uint64x1_t, q, , u64, 1, 0)
++/* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], v1.d\\\[0\\\]" 3 } } */
++
++/* vcopyq_laneq. */
++
++BUILD_TEST (poly8x16_t, poly8x16_t, q, q, p8, 14, 15)
++BUILD_TEST (int8x16_t, int8x16_t, q, q, s8, 14, 15)
++BUILD_TEST (uint8x16_t, uint8x16_t, q, q, u8, 14, 15)
++/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[14\\\], v1.b\\\[15\\\]" 3 } } */
++BUILD_TEST (poly16x8_t, poly16x8_t, q, q, p16, 6, 7)
++BUILD_TEST (int16x8_t, int16x8_t, q, q, s16, 6, 7)
++BUILD_TEST (uint16x8_t, uint16x8_t, q, q, u16, 6, 7)
++/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[6\\\], v1.h\\\[7\\\]" 3 } } */
++BUILD_TEST (float32x4_t, float32x4_t, q, q, f32, 2, 3)
++BUILD_TEST (int32x4_t, int32x4_t, q, q, s32, 2, 3)
++BUILD_TEST (uint32x4_t, uint32x4_t, q, q, u32, 2, 3)
++/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[2\\\], v1.s\\\[3\\\]" 3 } } */
++BUILD_TEST (float64x2_t, float64x2_t, q, q, f64, 1, 1)
++BUILD_TEST (int64x2_t, int64x2_t, q, q, s64, 1, 1)
++BUILD_TEST (uint64x2_t, uint64x2_t, q, q, u64, 1, 1)
++/* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], v1.d\\\[1\\\]" 3 } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
+@@ -0,0 +1,72 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++
++#include "arm_neon.h"
++
++#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \
++TYPE1 __attribute__((noinline,noclone)) \
++test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b) \
++{ \
++ return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\
++ a, INDEX1); \
++}
++
++BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6)
++BUILD_TEST (int8x8_t, int8x8_t, , , s8, 7, 6)
++BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6)
++/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[6\\\]" 3 } } */
++BUILD_TEST (poly16x4_t, poly16x4_t, , , p16, 3, 2)
++BUILD_TEST (int16x4_t, int16x4_t, , , s16, 3, 2)
++BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
++/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[2\\\]" 3 } } */
++BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
++BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0)
++BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0)
++/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */
++
++BUILD_TEST (poly8x8_t, poly8x16_t, , q, p8, 7, 15)
++BUILD_TEST (int8x8_t, int8x16_t, , q, s8, 7, 15)
++BUILD_TEST (uint8x8_t, uint8x16_t, , q, u8, 7, 15)
++/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[15\\\]" 3 } } */
++BUILD_TEST (poly16x4_t, poly16x8_t, , q, p16, 3, 7)
++BUILD_TEST (int16x4_t, int16x8_t, , q, s16, 3, 7)
++BUILD_TEST (uint16x4_t, uint16x8_t, , q, u16, 3, 7)
++/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[7\\\]" 3 } } */
++BUILD_TEST (float32x2_t, float32x4_t, , q, f32, 1, 3)
++BUILD_TEST (int32x2_t, int32x4_t, , q, s32, 1, 3)
++BUILD_TEST (uint32x2_t, uint32x4_t, , q, u32, 1, 3)
++/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[3\\\]" 3 } } */
++
++BUILD_TEST (poly8x16_t, poly8x8_t, q, , p8, 15, 7)
++BUILD_TEST (int8x16_t, int8x8_t, q, , s8, 15, 7)
++BUILD_TEST (uint8x16_t, uint8x8_t, q, , u8, 15, 7)
++/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[15\\\], v1.b\\\[7\\\]" 3 } } */
++BUILD_TEST (poly16x8_t, poly16x4_t, q, , p16, 7, 3)
++BUILD_TEST (int16x8_t, int16x4_t, q, , s16, 7, 3)
++BUILD_TEST (uint16x8_t, uint16x4_t, q, , u16, 7, 3)
++/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[7\\\], v1.h\\\[3\\\]" 3 } } */
++BUILD_TEST (float32x4_t, float32x2_t, q, , f32, 3, 1)
++BUILD_TEST (int32x4_t, int32x2_t, q, , s32, 3, 1)
++BUILD_TEST (uint32x4_t, uint32x2_t, q, , u32, 3, 1)
++/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[3\\\], v1.s\\\[1\\\]" 3 } } */
++BUILD_TEST (float64x2_t, float64x1_t, q, , f64, 1, 0)
++BUILD_TEST (int64x2_t, int64x1_t, q, , s64, 1, 0)
++BUILD_TEST (uint64x2_t, uint64x1_t, q, , u64, 1, 0)
++/* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], v1.d\\\[0\\\]" 3 } } */
++
++BUILD_TEST (poly8x16_t, poly8x16_t, q, q, p8, 14, 15)
++BUILD_TEST (int8x16_t, int8x16_t, q, q, s8, 14, 15)
++BUILD_TEST (uint8x16_t, uint8x16_t, q, q, u8, 14, 15)
++/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[14\\\], v1.b\\\[15\\\]" 3 } } */
++BUILD_TEST (poly16x8_t, poly16x8_t, q, q, p16, 6, 7)
++BUILD_TEST (int16x8_t, int16x8_t, q, q, s16, 6, 7)
++BUILD_TEST (uint16x8_t, uint16x8_t, q, q, u16, 6, 7)
++/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[6\\\], v1.h\\\[7\\\]" 3 } } */
++BUILD_TEST (float32x4_t, float32x4_t, q, q, f32, 2, 3)
++BUILD_TEST (int32x4_t, int32x4_t, q, q, s32, 2, 3)
++BUILD_TEST (uint32x4_t, uint32x4_t, q, q, u32, 2, 3)
++/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[2\\\], v1.s\\\[3\\\]" 3 } } */
++BUILD_TEST (float64x2_t, float64x2_t, q, q, f64, 1, 1)
++BUILD_TEST (int64x2_t, int64x2_t, q, q, s64, 1, 1)
++BUILD_TEST (uint64x2_t, uint64x2_t, q, q, u64, 1, 1)
++/* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], v1.d\\\[1\\\]" 3 } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/aarch64/vminmaxnm.c
+@@ -0,0 +1,37 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++
++#include "arm_neon.h"
++
++/* For each of these intrinsics, we map directly to an unspec in RTL.
++ We're just using the argument directly and returning the result, so we
++ can precisely specify the exact instruction pattern and register
++ allocations we expect. */
++
++float64x1_t
++test_vmaxnm_f64 (float64x1_t a, float64x1_t b)
++{
++ /* { dg-final { scan-assembler-times "fmaxnm\td0, d0, d1" 1 } } */
++ return vmaxnm_f64 (a, b);
++}
++
++float64x1_t
++test_vminnm_f64 (float64x1_t a, float64x1_t b)
++{
++ /* { dg-final { scan-assembler-times "fminnm\td0, d0, d1" 1 } } */
++ return vminnm_f64 (a, b);
++}
++
++float64x1_t
++test_vmax_f64 (float64x1_t a, float64x1_t b)
++{
++ /* { dg-final { scan-assembler-times "fmax\td0, d0, d1" 1 } } */
++ return vmax_f64 (a, b);
++}
++
++float64x1_t
++test_vmin_f64 (float64x1_t a, float64x1_t b)
++{
++ /* { dg-final { scan-assembler-times "fmin\td0, d0, d1" 1 } } */
++ return vmin_f64 (a, b);
++}
+\ No newline at end of file
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c
+@@ -0,0 +1,32 @@
++/* Test AAPCS layout (VFP variant for Neon types) */
++
++/* { dg-do run { target arm_eabi } } */
++/* { dg-require-effective-target arm_hard_vfp_ok } */
++/* { dg-require-effective-target arm_neon_fp16_hw } */
++/* { dg-add-options arm_neon_fp16 } */
++
++#ifndef IN_FRAMEWORK
++#define VFP
++#define NEON
++#define TESTFILE "neon-vect10.c"
++#include "neon-constants.h"
++
++#include "abitest.h"
++#else
++
++ARG (int32x4_t, i32x4_constvec2, Q0) /* D0, D1. */
++#if defined (__ARM_BIG_ENDIAN)
++ARG (__fp16, 3.0f, S4 + 2) /* D2, Q1. */
++#else
++ARG (__fp16, 3.0f, S4) /* D2, Q1. */
++#endif
++ARG (int32x4x2_t, i32x4x2_constvec1, Q2) /* Q2, Q3 - D4-D6 , s5-s12. */
++ARG (double, 12.0, D3) /* Backfill this particular argument. */
++#if defined (__ARM_BIG_ENDIAN)
++ARG (__fp16, 5.0f, S5 + 2) /* Backfill in S5. */
++#else
++ARG (__fp16, 5.0f, S5) /* Backfill in S5. */
++#endif
++ARG (int32x4x2_t, i32x4x2_constvec2, STACK)
++LAST_ARG (int, 3, R0)
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c
+@@ -0,0 +1,24 @@
++/* Test AAPCS layout (VFP variant for Neon types) */
++
++/* { dg-do run { target arm_eabi } } */
++/* { dg-require-effective-target arm_hard_vfp_ok } */
++/* { dg-require-effective-target arm_neon_fp16_hw } */
++/* { dg-add-options arm_neon_fp16 } */
++
++#ifndef IN_FRAMEWORK
++#define VFP
++#define NEON
++#define TESTFILE "neon-vect9.c"
++#include "neon-constants.h"
++
++#include "abitest.h"
++#else
++
++ARG (int32x4_t, i32x4_constvec2, Q0) /* D0, D1. */
++#if defined (__ARM_BIG_ENDIAN)
++ARG (__fp16, 3.0f, S4 + 2) /* D2, Q1 occupied. */
++#else
++ARG (__fp16, 3.0f, S4) /* D2, Q1 occupied. */
++#endif
++LAST_ARG (int, 3, R0)
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/aapcs/vfp18.c
+@@ -0,0 +1,28 @@
++/* Test AAPCS layout (VFP variant) */
++
++/* { dg-do run { target arm_eabi } } */
++/* { dg-require-effective-target arm_hard_vfp_ok } */
++/* { dg-require-effective-target arm_fp16_hw } */
++/* { dg-add-options arm_fp16_ieee } */
++
++#ifndef IN_FRAMEWORK
++#define VFP
++#define TESTFILE "vfp18.c"
++#include "abitest.h"
++
++#else
++#if defined (__ARM_BIG_ENDIAN)
++ARG (__fp16, 1.0f, S0 + 2)
++#else
++ARG (__fp16, 1.0f, S0)
++#endif
++ARG (float, 2.0f, S1)
++ARG (double, 4.0, D1)
++ARG (float, 2.0f, S4)
++#if defined (__ARM_BIG_ENDIAN)
++ARG (__fp16, 1.0f, S5 + 2)
++#else
++ARG (__fp16, 1.0f, S5)
++#endif
++LAST_ARG (int, 3, R0)
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/aapcs/vfp19.c
+@@ -0,0 +1,30 @@
++/* Test AAPCS layout (VFP variant) */
++
++/* { dg-do run { target arm_eabi } } */
++/* { dg-require-effective-target arm_hard_vfp_ok } */
++/* { dg-require-effective-target arm_fp16_hw } */
++/* { dg-add-options arm_fp16_ieee } */
++
++#ifndef IN_FRAMEWORK
++#define VFP
++#define TESTFILE "vfp19.c"
++
++__complex__ x = 1.0+2.0i;
++
++#include "abitest.h"
++#else
++#if defined (__ARM_BIG_ENDIAN)
++ARG (__fp16, 1.0f, S0 + 2)
++#else
++ARG (__fp16, 1.0f, S0)
++#endif
++ARG (float, 2.0f, S1)
++ARG (__complex__ double, x, D1)
++ARG (float, 3.0f, S6)
++#if defined (__ARM_BIG_ENDIAN)
++ARG (__fp16, 2.0f, S7 + 2)
++#else
++ARG (__fp16, 2.0f, S7)
++#endif
++LAST_ARG (int, 3, R0)
++#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/aapcs/vfp20.c
+@@ -0,0 +1,22 @@
++/* Test AAPCS layout (VFP variant) */
++
++/* { dg-do run { target arm_eabi } } */
++/* { dg-require-effective-target arm_hard_vfp_ok } */
++/* { dg-require-effective-target arm_fp16_hw } */
++/* { dg-add-options arm_fp16_ieee } */
++
++#ifndef IN_FRAMEWORK
++#define VFP
++#define TESTFILE "vfp20.c"
++
++#define PCSATTR __attribute__((pcs("aapcs")))
++
++#include "abitest.h"
++#else
++ARG (float, 1.0f, R0)
++ARG (double, 2.0, R2)
++ARG (float, 3.0f, STACK)
++ARG (__fp16, 2.0f, STACK+4)
++LAST_ARG (double, 4.0, STACK+8)
++#endif
++
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/aapcs/vfp21.c
+@@ -0,0 +1,26 @@
++/* Test AAPCS layout (VFP variant) */
++
++/* { dg-do run { target arm_eabi } } */
++/* { dg-require-effective-target arm_hard_vfp_ok } */
++/* { dg-require-effective-target arm_fp16_hw } */
++/* { dg-add-options arm_fp16_ieee } */
++
++#ifndef IN_FRAMEWORK
++#define VFP
++#define TESTFILE "vfp21.c"
++
++#define PCSATTR __attribute__((pcs("aapcs")))
++
++#include "abitest.h"
++#else
++#if defined (__ARM_BIG_ENDIAN)
++ARG (__fp16, 1.0f, R0 + 2)
++#else
++ARG (__fp16, 1.0f, R0)
++#endif
++ARG (double, 2.0, R2)
++ARG (__fp16, 3.0f, STACK)
++ARG (float, 2.0f, STACK+4)
++LAST_ARG (double, 4.0, STACK+8)
++#endif
++
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/armv5_thumb_isa.c
+@@ -0,0 +1,8 @@
++/* { dg-require-effective-target arm_arch_v5_ok } */
++/* { dg-add-options arm_arch_v5 } */
++
++#if __ARM_ARCH_ISA_THUMB
++#error "__ARM_ARCH_ISA_THUMB defined for ARMv5"
++#endif
++
++int foo;
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c
+@@ -0,0 +1,105 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
++/* { dg-options "-O2 -ffast-math" } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++
++/* Test instructions generated for half-precision arithmetic. */
++
++typedef __fp16 float16_t;
++typedef __simd64_float16_t float16x4_t;
++typedef __simd128_float16_t float16x8_t;
++
++typedef short int16x4_t __attribute__ ((vector_size (8)));
++typedef short int int16x8_t __attribute__ ((vector_size (16)));
++
++float16_t
++fp16_abs (float16_t a)
++{
++ return (a < 0) ? -a : a;
++}
+
-+ /* vreinterpretq_p8_xx. */
-+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, int, s, 8, 16, expected_q_p8_1);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, int, s, 16, 8, expected_q_p8_2);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, int, s, 32, 4, expected_q_p8_3);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, int, s, 64, 2, expected_q_p8_4);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, uint, u, 8, 16, expected_q_p8_5);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, uint, u, 16, 8, expected_q_p8_6);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, uint, u, 32, 4, expected_q_p8_7);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, uint, u, 64, 2, expected_q_p8_8);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, poly, p, 16, 8, expected_q_p8_9);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, float, f, 16, 8, expected_q_p8_10);
++#define TEST_UNOP(NAME, OPERATOR, TY) \
++ TY test_##NAME##_##TY (TY a) \
++ { \
++ return OPERATOR (a); \
++ }
+
-+ /* vreinterpretq_p16_xx. */
-+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, int, s, 8, 16, expected_q_p16_1);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, int, s, 16, 8, expected_q_p16_2);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, int, s, 32, 4, expected_q_p16_3);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, int, s, 64, 2, expected_q_p16_4);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, uint, u, 8, 16, expected_q_p16_5);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, uint, u, 16, 8, expected_q_p16_6);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, uint, u, 32, 4, expected_q_p16_7);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, uint, u, 64, 2, expected_q_p16_8);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, poly, p, 8, 16, expected_q_p16_9);
-+ TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, float, f, 16, 8, expected_q_p16_10);
-
- /* vreinterpret_f32_xx. */
- TEST_VREINTERPRET_FP(, float, f, 32, 2, int, s, 8, 8, expected_f32_1);
-@@ -696,6 +921,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET_FP(, float, f, 32, 2, uint, u, 64, 1, expected_f32_8);
- TEST_VREINTERPRET_FP(, float, f, 32, 2, poly, p, 8, 8, expected_f32_9);
- TEST_VREINTERPRET_FP(, float, f, 32, 2, poly, p, 16, 4, expected_f32_10);
-+ TEST_VREINTERPRET_FP(, float, f, 32, 2, float, f, 16, 4, expected_f32_11);
-
- /* vreinterpretq_f32_xx. */
- TEST_VREINTERPRET_FP(q, float, f, 32, 4, int, s, 8, 16, expected_q_f32_1);
-@@ -708,6 +934,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET_FP(q, float, f, 32, 4, uint, u, 64, 2, expected_q_f32_8);
- TEST_VREINTERPRET_FP(q, float, f, 32, 4, poly, p, 8, 16, expected_q_f32_9);
- TEST_VREINTERPRET_FP(q, float, f, 32, 4, poly, p, 16, 8, expected_q_f32_10);
-+ TEST_VREINTERPRET_FP(q, float, f, 32, 4, float, f, 16, 8, expected_q_f32_11);
-
- /* vreinterpret_xx_f32. */
- TEST_VREINTERPRET(, int, s, 8, 8, float, f, 32, 2, expected_xx_f32_1);
-@@ -720,6 +947,7 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(, uint, u, 64, 1, float, f, 32, 2, expected_xx_f32_8);
- TEST_VREINTERPRET_POLY(, poly, p, 8, 8, float, f, 32, 2, expected_xx_f32_9);
- TEST_VREINTERPRET_POLY(, poly, p, 16, 4, float, f, 32, 2, expected_xx_f32_10);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, float, f, 32, 2, expected_xx_f32_11);
-
- /* vreinterpretq_xx_f32. */
- TEST_VREINTERPRET(q, int, s, 8, 16, float, f, 32, 4, expected_q_xx_f32_1);
-@@ -732,6 +960,31 @@ void exec_vreinterpret (void)
- TEST_VREINTERPRET(q, uint, u, 64, 2, float, f, 32, 4, expected_q_xx_f32_8);
- TEST_VREINTERPRET_POLY(q, poly, p, 8, 16, float, f, 32, 4, expected_q_xx_f32_9);
- TEST_VREINTERPRET_POLY(q, poly, p, 16, 8, float, f, 32, 4, expected_q_xx_f32_10);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, float, f, 32, 4, expected_q_xx_f32_11);
++#define TEST_BINOP(NAME, OPERATOR, TY) \
++ TY test_##NAME##_##TY (TY a, TY b) \
++ { \
++ return a OPERATOR b; \
++ }
+
-+ /* vreinterpret_f16_xx. */
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, int, s, 8, 8, expected_f16_1);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, int, s, 16, 4, expected_f16_2);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, int, s, 32, 2, expected_f16_3);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, int, s, 64, 1, expected_f16_4);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, uint, u, 8, 8, expected_f16_5);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, uint, u, 16, 4, expected_f16_6);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, uint, u, 32, 2, expected_f16_7);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, uint, u, 64, 1, expected_f16_8);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, poly, p, 8, 8, expected_f16_9);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, poly, p, 16, 4, expected_f16_10);
++#define TEST_CMP(NAME, OPERATOR, RTY, TY) \
++ RTY test_##NAME##_##TY (TY a, TY b) \
++ { \
++ return a OPERATOR b; \
++ }
+
-+ /* vreinterpretq_f16_xx. */
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, int, s, 8, 16, expected_q_f16_1);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, int, s, 16, 8, expected_q_f16_2);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, int, s, 32, 4, expected_q_f16_3);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, int, s, 64, 2, expected_q_f16_4);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, uint, u, 8, 16, expected_q_f16_5);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, uint, u, 16, 8, expected_q_f16_6);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, uint, u, 32, 4, expected_q_f16_7);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, uint, u, 64, 2, expected_q_f16_8);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, poly, p, 8, 16, expected_q_f16_9);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, poly, p, 16, 8, expected_q_f16_10);
- }
-
- int main (void)
++/* Scalars. */
++
++TEST_UNOP (neg, -, float16_t)
++TEST_UNOP (abs, fp16_abs, float16_t)
++
++TEST_BINOP (add, +, float16_t)
++TEST_BINOP (sub, -, float16_t)
++TEST_BINOP (mult, *, float16_t)
++TEST_BINOP (div, /, float16_t)
++
++TEST_CMP (equal, ==, int, float16_t)
++TEST_CMP (unequal, !=, int, float16_t)
++TEST_CMP (lessthan, <, int, float16_t)
++TEST_CMP (greaterthan, >, int, float16_t)
++TEST_CMP (lessthanequal, <=, int, float16_t)
++TEST_CMP (greaterthanqual, >=, int, float16_t)
++
++/* Vectors of size 4. */
++
++TEST_UNOP (neg, -, float16x4_t)
++
++TEST_BINOP (add, +, float16x4_t)
++TEST_BINOP (sub, -, float16x4_t)
++TEST_BINOP (mult, *, float16x4_t)
++TEST_BINOP (div, /, float16x4_t)
++
++TEST_CMP (equal, ==, int16x4_t, float16x4_t)
++TEST_CMP (unequal, !=, int16x4_t, float16x4_t)
++TEST_CMP (lessthan, <, int16x4_t, float16x4_t)
++TEST_CMP (greaterthan, >, int16x4_t, float16x4_t)
++TEST_CMP (lessthanequal, <=, int16x4_t, float16x4_t)
++TEST_CMP (greaterthanqual, >=, int16x4_t, float16x4_t)
++
++/* Vectors of size 8. */
++
++TEST_UNOP (neg, -, float16x8_t)
++
++TEST_BINOP (add, +, float16x8_t)
++TEST_BINOP (sub, -, float16x8_t)
++TEST_BINOP (mult, *, float16x8_t)
++TEST_BINOP (div, /, float16x8_t)
++
++TEST_CMP (equal, ==, int16x8_t, float16x8_t)
++TEST_CMP (unequal, !=, int16x8_t, float16x8_t)
++TEST_CMP (lessthan, <, int16x8_t, float16x8_t)
++TEST_CMP (greaterthan, >, int16x8_t, float16x8_t)
++TEST_CMP (lessthanequal, <=, int16x8_t, float16x8_t)
++TEST_CMP (greaterthanqual, >=, int16x8_t, float16x8_t)
++
++/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } } */
++/* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } } */
++/* { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } } */
++
++/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */
++/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */
++/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */
++/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */
++/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } } */
++/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } } */
++
++/* { dg-final { scan-assembler-not {vadd\.f32} } } */
++/* { dg-final { scan-assembler-not {vsub\.f32} } } */
++/* { dg-final { scan-assembler-not {vmul\.f32} } } */
++/* { dg-final { scan-assembler-not {vdiv\.f32} } } */
++/* { dg-final { scan-assembler-not {vcmp\.f16} } } */
++/* { dg-final { scan-assembler-not {vcmpe\.f16} } } */
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c
-@@ -0,0 +1,160 @@
-+/* This file contains tests for the vreinterpret *p128 intrinsics. */
++++ b/src/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c
+@@ -0,0 +1,101 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++/* Test ARMv8.2 FP16 conversions. */
++#include <arm_fp16.h>
++
++float
++f16_to_f32 (__fp16 a)
++{
++ return (float)a;
++}
+
-+/* { dg-require-effective-target arm_crypto_ok } */
-+/* { dg-add-options arm_crypto } */
++float
++f16_to_pf32 (__fp16* a)
++{
++ return (float)*a;
++}
++
++short
++f16_to_s16 (__fp16 a)
++{
++ return (short)a;
++}
++
++short
++pf16_to_s16 (__fp16* a)
++{
++ return (short)*a;
++}
++
++/* { dg-final { scan-assembler-times {vcvtb\.f32\.f16\ts[0-9]+, s[0-9]+} 4 } } */
++
++__fp16
++f32_to_f16 (float a)
++{
++ return (__fp16)a;
++}
++
++void
++f32_to_pf16 (__fp16* x, float a)
++{
++ *x = (__fp16)a;
++}
++
++__fp16
++s16_to_f16 (short a)
++{
++ return (__fp16)a;
++}
++
++void
++s16_to_pf16 (__fp16* x, short a)
++{
++ *x = (__fp16)a;
++}
++
++/* { dg-final { scan-assembler-times {vcvtb\.f16\.f32\ts[0-9]+, s[0-9]+} 4 } } */
++
++float
++s16_to_f32 (short a)
++{
++ return (float)a;
++}
++
++/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+, s[0-9]+} 3 } } */
++
++short
++f32_to_s16 (float a)
++{
++ return (short)a;
++}
++
++/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+, s[0-9]+} 3 } } */
++
++unsigned short
++f32_to_u16 (float a)
++{
++ return (unsigned short)a;
++}
++
++/* { dg-final { scan-assembler-times {vcvt\.u32\.f32\ts[0-9]+, s[0-9]+} 1 } } */
++
++short
++f64_to_s16 (double a)
++{
++ return (short)a;
++}
++
++/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } } */
++
++unsigned short
++f64_to_u16 (double a)
++{
++ return (unsigned short)a;
++}
++
++/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } } */
++
++
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+@@ -0,0 +1,165 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++__fp16
++test_load_1 (__fp16* a)
++{
++ return *a;
++}
++
++__fp16
++test_load_2 (__fp16* a, int i)
++{
++ return a[i];
++}
++
++/* { dg-final { scan-assembler-times {vld1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]+\]} 2 } } */
++
++void
++test_store_1 (__fp16* a, __fp16 b)
++{
++ *a = b;
++}
++
++void
++test_store_2 (__fp16* a, int i, __fp16 b)
++{
++ a[i] = b;
++}
++
++/* { dg-final { scan-assembler-times {vst1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]+\]} 2 } } */
++
++__fp16
++test_load_store_1 (__fp16* a, int i, __fp16* b)
++{
++ a[i] = b[i];
++}
++
++__fp16
++test_load_store_2 (__fp16* a, int i, __fp16* b)
++{
++ a[i] = b[i + 2];
++ return a[i];
++}
++/* { dg-final { scan-assembler-times {ldrh\tr[0-9]+} 2 } } */
++/* { dg-final { scan-assembler-times {strh\tr[0-9]+} 2 } } */
++
++__fp16
++test_select_1 (int sel, __fp16 a, __fp16 b)
++{
++ if (sel)
++ return a;
++ else
++ return b;
++}
++
++__fp16
++test_select_2 (int sel, __fp16 a, __fp16 b)
++{
++ return sel ? a : b;
++}
++
++__fp16
++test_select_3 (__fp16 a, __fp16 b, __fp16 c)
++{
++ return (a == b) ? b : c;
++}
++
++__fp16
++test_select_4 (__fp16 a, __fp16 b, __fp16 c)
++{
++ return (a != b) ? b : c;
++}
++
++__fp16
++test_select_5 (__fp16 a, __fp16 b, __fp16 c)
++{
++ return (a < b) ? b : c;
++}
++
++__fp16
++test_select_6 (__fp16 a, __fp16 b, __fp16 c)
++{
++ return (a <= b) ? b : c;
++}
++
++__fp16
++test_select_7 (__fp16 a, __fp16 b, __fp16 c)
++{
++ return (a > b) ? b : c;
++}
++
++__fp16
++test_select_8 (__fp16 a, __fp16 b, __fp16 c)
++{
++ return (a >= b) ? b : c;
++}
++
++/* { dg-final { scan-assembler-times {vseleq\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 4 } } */
++/* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++/* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
++/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } } */
++/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } } */
++
++int
++test_compare_1 (__fp16 a, __fp16 b)
++{
++ if (a == b)
++ return -1;
++ else
++ return 0;
++}
++
++int
++test_compare_ (__fp16 a, __fp16 b)
++{
++ if (a != b)
++ return -1;
++ else
++ return 0;
++}
++
++int
++test_compare_2 (__fp16 a, __fp16 b)
++{
++ if (a > b)
++ return -1;
++ else
++ return 0;
++}
++
++int
++test_compare_3 (__fp16 a, __fp16 b)
++{
++ if (a >= b)
++ return -1;
++ else
++ return 0;
++}
++
++int
++test_compare_4 (__fp16 a, __fp16 b)
++{
++ if (a < b)
++ return -1;
++ else
++ return 0;
++}
++
++int
++test_compare_5 (__fp16 a, __fp16 b)
++{
++ if (a <= b)
++ return -1;
++ else
++ return 0;
++}
++
++/* { dg-final { scan-assembler-not {vcmp\.f16} } } */
++/* { dg-final { scan-assembler-not {vcmpe\.f16} } } */
++
++/* { dg-final { scan-assembler-times {vcmp\.f32} 4 } } */
++/* { dg-final { scan-assembler-times {vcmpe\.f32} 8 } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
+@@ -0,0 +1,490 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_v8_2a_fp16_neon } */
++
++/* Test instructions generated for the FP16 vector intrinsics. */
+
+#include <arm_neon.h>
-+#include "arm-neon-ref.h"
-+#include "compute-ref-data.h"
+
-+/* Expected results: vreinterpretq_p128_*. */
-+VECT_VAR_DECL(vreint_expected_q_p128_s8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
-+ 0xfffefdfcfbfaf9f8 };
-+VECT_VAR_DECL(vreint_expected_q_p128_s16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
-+ 0xfff7fff6fff5fff4 };
-+VECT_VAR_DECL(vreint_expected_q_p128_s32,poly,64,2) [] = { 0xfffffff1fffffff0,
-+ 0xfffffff3fffffff2 };
-+VECT_VAR_DECL(vreint_expected_q_p128_s64,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vreint_expected_q_p128_u8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
-+ 0xfffefdfcfbfaf9f8 };
-+VECT_VAR_DECL(vreint_expected_q_p128_u16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
-+ 0xfff7fff6fff5fff4 };
-+VECT_VAR_DECL(vreint_expected_q_p128_u32,poly,64,2) [] = { 0xfffffff1fffffff0,
-+ 0xfffffff3fffffff2 };
-+VECT_VAR_DECL(vreint_expected_q_p128_u64,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vreint_expected_q_p128_p8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
-+ 0xfffefdfcfbfaf9f8 };
-+VECT_VAR_DECL(vreint_expected_q_p128_p16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
-+ 0xfff7fff6fff5fff4 };
-+VECT_VAR_DECL(vreint_expected_q_p128_f32,poly,64,2) [] = { 0xc1700000c1800000,
-+ 0xc1500000c1600000 };
-+VECT_VAR_DECL(vreint_expected_q_p128_f16,poly,64,2) [] = { 0xca80cb00cb80cc00,
-+ 0xc880c900c980ca00 };
++#define MSTRCAT(L, str) L##str
++
++#define UNOP_TEST(insn) \
++ float16x4_t \
++ MSTRCAT (test_##insn, _16x4) (float16x4_t a) \
++ { \
++ return MSTRCAT (insn, _f16) (a); \
++ } \
++ float16x8_t \
++ MSTRCAT (test_##insn, _16x8) (float16x8_t a) \
++ { \
++ return MSTRCAT (insn, q_f16) (a); \
++ }
++
++#define BINOP_TEST(insn) \
++ float16x4_t \
++ MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b) \
++ { \
++ return MSTRCAT (insn, _f16) (a, b); \
++ } \
++ float16x8_t \
++ MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b) \
++ { \
++ return MSTRCAT (insn, q_f16) (a, b); \
++ }
++
++#define BINOP_LANE_TEST(insn, I) \
++ float16x4_t \
++ MSTRCAT (test_##insn##_lane, _16x4) (float16x4_t a, float16x4_t b) \
++ { \
++ return MSTRCAT (insn, _lane_f16) (a, b, I); \
++ } \
++ float16x8_t \
++ MSTRCAT (test_##insn##_lane, _16x8) (float16x8_t a, float16x4_t b) \
++ { \
++ return MSTRCAT (insn, q_lane_f16) (a, b, I); \
++ }
++
++#define BINOP_LANEQ_TEST(insn, I) \
++ float16x4_t \
++ MSTRCAT (test_##insn##_laneq, _16x4) (float16x4_t a, float16x8_t b) \
++ { \
++ return MSTRCAT (insn, _laneq_f16) (a, b, I); \
++ } \
++ float16x8_t \
++ MSTRCAT (test_##insn##_laneq, _16x8) (float16x8_t a, float16x8_t b) \
++ { \
++ return MSTRCAT (insn, q_laneq_f16) (a, b, I); \
++ } \
++
++#define BINOP_N_TEST(insn) \
++ float16x4_t \
++ MSTRCAT (test_##insn##_n, _16x4) (float16x4_t a, float16_t b) \
++ { \
++ return MSTRCAT (insn, _n_f16) (a, b); \
++ } \
++ float16x8_t \
++ MSTRCAT (test_##insn##_n, _16x8) (float16x8_t a, float16_t b) \
++ { \
++ return MSTRCAT (insn, q_n_f16) (a, b); \
++ }
++
++#define TERNOP_TEST(insn) \
++ float16_t \
++ MSTRCAT (test_##insn, _16) (float16_t a, float16_t b, float16_t c) \
++ { \
++ return MSTRCAT (insn, h_f16) (a, b, c); \
++ } \
++ float16x4_t \
++ MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b, \
++ float16x4_t c) \
++ { \
++ return MSTRCAT (insn, _f16) (a, b, c); \
++ } \
++ float16x8_t \
++ MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b, \
++ float16x8_t c) \
++ { \
++ return MSTRCAT (insn, q_f16) (a, b, c); \
++ }
++
++#define VCMP1_TEST(insn) \
++ uint16x4_t \
++ MSTRCAT (test_##insn, _16x4) (float16x4_t a) \
++ { \
++ return MSTRCAT (insn, _f16) (a); \
++ } \
++ uint16x8_t \
++ MSTRCAT (test_##insn, _16x8) (float16x8_t a) \
++ { \
++ return MSTRCAT (insn, q_f16) (a); \
++ }
++
++#define VCMP2_TEST(insn) \
++ uint16x4_t \
++ MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b) \
++ { \
++ return MSTRCAT (insn, _f16) (a, b); \
++ } \
++ uint16x8_t \
++ MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b) \
++ { \
++ return MSTRCAT (insn, q_f16) (a, b); \
++ }
++
++#define VCVT_TEST(insn, TY, TO, FR) \
++ MSTRCAT (TO, 16x4_t) \
++ MSTRCAT (test_##insn, TY) (MSTRCAT (FR, 16x4_t) a) \
++ { \
++ return MSTRCAT (insn, TY) (a); \
++ } \
++ MSTRCAT (TO, 16x8_t) \
++ MSTRCAT (test_##insn##_q, TY) (MSTRCAT (FR, 16x8_t) a) \
++ { \
++ return MSTRCAT (insn, q##TY) (a); \
++ }
+
-+/* Expected results: vreinterpretq_*_p128. */
-+VECT_VAR_DECL(vreint_expected_q_s8_p128,int,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff,
-+ 0xf1, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(vreint_expected_q_s16_p128,int,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
-+VECT_VAR_DECL(vreint_expected_q_s32_p128,int,32,4) [] = { 0xfffffff0, 0xffffffff,
-+ 0xfffffff1, 0xffffffff };
-+VECT_VAR_DECL(vreint_expected_q_s64_p128,int,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vreint_expected_q_u8_p128,uint,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff,
-+ 0xf1, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(vreint_expected_q_u16_p128,uint,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
-+VECT_VAR_DECL(vreint_expected_q_u32_p128,uint,32,4) [] = { 0xfffffff0, 0xffffffff,
-+ 0xfffffff1, 0xffffffff };
-+VECT_VAR_DECL(vreint_expected_q_u64_p128,uint,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vreint_expected_q_p8_p128,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff,
-+ 0xf1, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(vreint_expected_q_p16_p128,poly,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
-+VECT_VAR_DECL(vreint_expected_q_p64_p128,uint,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vreint_expected_q_f32_p128,hfloat,32,4) [] = { 0xfffffff0, 0xffffffff,
-+ 0xfffffff1, 0xffffffff };
-+VECT_VAR_DECL(vreint_expected_q_f16_p128,hfloat,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
++#define VCVT_N_TEST(insn, TY, TO, FR) \
++ MSTRCAT (TO, 16x4_t) \
++ MSTRCAT (test_##insn##_n, TY) (MSTRCAT (FR, 16x4_t) a) \
++ { \
++ return MSTRCAT (insn, _n##TY) (a, 1); \
++ } \
++ MSTRCAT (TO, 16x8_t) \
++ MSTRCAT (test_##insn##_n_q, TY) (MSTRCAT (FR, 16x8_t) a) \
++ { \
++ return MSTRCAT (insn, q_n##TY) (a, 1); \
++ }
+
-+int main (void)
++VCMP1_TEST (vceqz)
++/* { dg-final { scan-assembler-times {vceq\.f16\td[0-9]+, d[0-0]+, #0} 1 } } */
++/* { dg-final { scan-assembler-times {vceq\.f16\tq[0-9]+, q[0-9]+, #0} 1 } } */
++
++VCMP1_TEST (vcgtz)
++/* { dg-final { scan-assembler-times {vcgt\.f16\td[0-9]+, d[0-9]+, #0} 1 } } */
++/* { dg-final { scan-assembler-times {vceq\.f16\tq[0-9]+, q[0-9]+, #0} 1 } } */
++
++VCMP1_TEST (vcgez)
++/* { dg-final { scan-assembler-times {vcge\.f16\td[0-9]+, d[0-9]+, #0} 1 } } */
++/* { dg-final { scan-assembler-times {vcge\.f16\tq[0-9]+, q[0-9]+, #0} 1 } } */
++
++VCMP1_TEST (vcltz)
++/* { dg-final { scan-assembler-times {vclt.f16\td[0-9]+, d[0-9]+, #0} 1 } } */
++/* { dg-final { scan-assembler-times {vclt.f16\tq[0-9]+, q[0-9]+, #0} 1 } } */
++
++VCMP1_TEST (vclez)
++/* { dg-final { scan-assembler-times {vcle\.f16\td[0-9]+, d[0-9]+, #0} 1 } } */
++/* { dg-final { scan-assembler-times {vcle\.f16\tq[0-9]+, q[0-9]+, #0} 1 } } */
++
++VCVT_TEST (vcvt, _f16_s16, float, int)
++VCVT_N_TEST (vcvt, _f16_s16, float, int)
++/* { dg-final { scan-assembler-times {vcvt\.f16\.s16\td[0-9]+, d[0-9]+} 2 } }
++ { dg-final { scan-assembler-times {vcvt\.f16\.s16\tq[0-9]+, q[0-9]+} 2 } }
++ { dg-final { scan-assembler-times {vcvt\.f16\.s16\td[0-9]+, d[0-9]+, #1} 1 } }
++ { dg-final { scan-assembler-times {vcvt\.f16\.s16\tq[0-9]+, q[0-9]+, #1} 1 } } */
++
++VCVT_TEST (vcvt, _f16_u16, float, uint)
++VCVT_N_TEST (vcvt, _f16_u16, float, uint)
++/* { dg-final { scan-assembler-times {vcvt\.f16\.u16\td[0-9]+, d[0-9]+} 2 } }
++ { dg-final { scan-assembler-times {vcvt\.f16\.u16\tq[0-9]+, q[0-9]+} 2 } }
++ { dg-final { scan-assembler-times {vcvt\.f16\.u16\td[0-9]+, d[0-9]+, #1} 1 } }
++ { dg-final { scan-assembler-times {vcvt\.f16\.u16\tq[0-9]+, q[0-9]+, #1} 1 } } */
++
++VCVT_TEST (vcvt, _s16_f16, int, float)
++VCVT_N_TEST (vcvt, _s16_f16, int, float)
++/* { dg-final { scan-assembler-times {vcvt\.s16\.f16\td[0-9]+, d[0-9]+} 2 } }
++ { dg-final { scan-assembler-times {vcvt\.s16\.f16\tq[0-9]+, q[0-9]+} 2 } }
++ { dg-final { scan-assembler-times {vcvt\.s16\.f16\td[0-9]+, d[0-9]+, #1} 1 } }
++ { dg-final { scan-assembler-times {vcvt\.s16\.f16\tq[0-9]+, q[0-9]+, #1} 1 } } */
++
++VCVT_TEST (vcvt, _u16_f16, uint, float)
++VCVT_N_TEST (vcvt, _u16_f16, uint, float)
++/* { dg-final { scan-assembler-times {vcvt\.u16\.f16\td[0-9]+, d[0-9]+} 2 } }
++ { dg-final { scan-assembler-times {vcvt\.u16\.f16\tq[0-9]+, q[0-9]+} 2 } }
++ { dg-final { scan-assembler-times {vcvt\.u16\.f16\td[0-9]+, d[0-9]+, #1} 1 } }
++ { dg-final { scan-assembler-times {vcvt\.u16\.f16\tq[0-9]+, q[0-9]+, #1} 1 } } */
++
++VCVT_TEST (vcvta, _s16_f16, int, float)
++/* { dg-final { scan-assembler-times {vcvta\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcvta\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
++*/
++
++VCVT_TEST (vcvta, _u16_f16, uint, float)
++/* { dg-final { scan-assembler-times {vcvta\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcvta\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
++*/
++
++VCVT_TEST (vcvtm, _s16_f16, int, float)
++/* { dg-final { scan-assembler-times {vcvtm\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcvtm\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
++*/
++
++VCVT_TEST (vcvtm, _u16_f16, uint, float)
++/* { dg-final { scan-assembler-times {vcvtm\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcvtm\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
++*/
++
++VCVT_TEST (vcvtn, _s16_f16, int, float)
++/* { dg-final { scan-assembler-times {vcvtn\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcvtn\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
++*/
++
++VCVT_TEST (vcvtn, _u16_f16, uint, float)
++/* { dg-final { scan-assembler-times {vcvtn\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcvtn\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
++*/
++
++VCVT_TEST (vcvtp, _s16_f16, int, float)
++/* { dg-final { scan-assembler-times {vcvtp\.s16\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcvtp\.s16\.f16\tq[0-9]+, q[0-9]+} 1 } }
++*/
++
++VCVT_TEST (vcvtp, _u16_f16, uint, float)
++/* { dg-final { scan-assembler-times {vcvtp\.u16\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcvtp\.u16\.f16\tq[0-9]+, q[0-9]+} 1 } }
++*/
++
++UNOP_TEST (vabs)
++/* { dg-final { scan-assembler-times {vabs\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vabs\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++
++UNOP_TEST (vneg)
++/* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++
++UNOP_TEST (vrecpe)
++/* { dg-final { scan-assembler-times {vrecpe\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrecpe\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++
++UNOP_TEST (vrnd)
++/* { dg-final { scan-assembler-times {vrintz\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrintz\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++
++UNOP_TEST (vrnda)
++/* { dg-final { scan-assembler-times {vrinta\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrinta\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++
++UNOP_TEST (vrndm)
++/* { dg-final { scan-assembler-times {vrintm\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrintm\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++
++UNOP_TEST (vrndn)
++/* { dg-final { scan-assembler-times {vrintn\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrintn\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++
++UNOP_TEST (vrndp)
++/* { dg-final { scan-assembler-times {vrintp\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrintp\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++
++UNOP_TEST (vrndx)
++/* { dg-final { scan-assembler-times {vrintx\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrintx\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++
++UNOP_TEST (vrsqrte)
++/* { dg-final { scan-assembler-times {vrsqrte\.f16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrsqrte\.f16\tq[0-9]+, q[0-9]+} 1 } } */
++
++BINOP_TEST (vadd)
++/* { dg-final { scan-assembler-times {vadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vadd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++BINOP_TEST (vabd)
++/* { dg-final { scan-assembler-times {vabd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vabd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++VCMP2_TEST (vcage)
++/* { dg-final { scan-assembler-times {vacge\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vacge\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++VCMP2_TEST (vcagt)
++/* { dg-final { scan-assembler-times {vacgt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vacgt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++VCMP2_TEST (vcale)
++/* { dg-final { scan-assembler-times {vacle\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vacle\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++VCMP2_TEST (vcalt)
++/* { dg-final { scan-assembler-times {vaclt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vaclt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++VCMP2_TEST (vceq)
++/* { dg-final { scan-assembler-times {vceq\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vceq\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++VCMP2_TEST (vcge)
++/* { dg-final { scan-assembler-times {vcge\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcge\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++VCMP2_TEST (vcgt)
++/* { dg-final { scan-assembler-times {vcgt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcgt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++VCMP2_TEST (vcle)
++/* { dg-final { scan-assembler-times {vcle\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vcle\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++VCMP2_TEST (vclt)
++/* { dg-final { scan-assembler-times {vclt\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vclt\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++BINOP_TEST (vmax)
++/* { dg-final { scan-assembler-times {vmax\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vmax\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++BINOP_TEST (vmin)
++/* { dg-final { scan-assembler-times {vmin\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vmin\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++BINOP_TEST (vmaxnm)
++/* { dg-final { scan-assembler-times {vmaxnm\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vmaxnm\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++BINOP_TEST (vminnm)
++/* { dg-final { scan-assembler-times {vminnm\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vminnm\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++
++BINOP_TEST (vmul)
++/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 3 } }
++ { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
++BINOP_LANE_TEST (vmul, 2)
++/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+\[2\]} 1 } }
++ { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[2\]} 1 } } */
++BINOP_N_TEST (vmul)
++/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\]} 1 } }
++ { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\]} 1 } }*/
++
++float16x4_t
++test_vpadd_16x4 (float16x4_t a, float16x4_t b)
+{
-+ DECL_VARIABLE_128BITS_VARIANTS(vreint_vector);
-+ DECL_VARIABLE(vreint_vector, poly, 64, 2);
-+ DECL_VARIABLE_128BITS_VARIANTS(vreint_vector_res);
-+ DECL_VARIABLE(vreint_vector_res, poly, 64, 2);
++ return vpadd_f16 (a, b);
++}
++/* { dg-final { scan-assembler-times {vpadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } } */
+
-+ clean_results ();
++float16x4_t
++test_vpmax_16x4 (float16x4_t a, float16x4_t b)
++{
++ return vpmax_f16 (a, b);
++}
++/* { dg-final { scan-assembler-times {vpmax\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } } */
+
-+ TEST_MACRO_128BITS_VARIANTS_2_5(VLOAD, vreint_vector, buffer);
-+ VLOAD(vreint_vector, buffer, q, poly, p, 64, 2);
-+ VLOAD(vreint_vector, buffer, q, float, f, 16, 8);
-+ VLOAD(vreint_vector, buffer, q, float, f, 32, 4);
++float16x4_t
++test_vpmin_16x4 (float16x4_t a, float16x4_t b)
++{
++ return vpmin_f16 (a, b);
++}
++/* { dg-final { scan-assembler-times {vpmin\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } } */
+
-+ /* vreinterpretq_p128_* tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VREINTERPRETQ_P128_*"
++BINOP_TEST (vsub)
++/* { dg-final { scan-assembler-times {vsub\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+
-+ /* Since there is no way to store a poly128_t value, convert to
-+ poly64x2_t before storing. This means that we are not able to
-+ test vreinterpretq_p128* alone, and that errors in
-+ vreinterpretq_p64_p128 could compensate for errors in
-+ vreinterpretq_p128*. */
-+#define TEST_VREINTERPRET128(Q, T1, T2, W, N, TS1, TS2, WS, NS, EXPECTED) \
-+ VECT_VAR(vreint_vector_res, poly, 64, 2) = vreinterpretq_p64_p128( \
-+ vreinterpret##Q##_##T2##W##_##TS2##WS(VECT_VAR(vreint_vector, TS1, WS, NS))); \
-+ vst1##Q##_##T2##64(VECT_VAR(result, poly, 64, 2), \
-+ VECT_VAR(vreint_vector_res, poly, 64, 2)); \
-+ CHECK(TEST_MSG, T1, 64, 2, PRIx##64, EXPECTED, "");
++BINOP_TEST (vrecps)
++/* { dg-final { scan-assembler-times {vrecps\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrecps\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, int, s, 8, 16, vreint_expected_q_p128_s8);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, int, s, 16, 8, vreint_expected_q_p128_s16);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, int, s, 32, 4, vreint_expected_q_p128_s32);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, int, s, 64, 2, vreint_expected_q_p128_s64);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, uint, u, 8, 16, vreint_expected_q_p128_u8);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, uint, u, 16, 8, vreint_expected_q_p128_u16);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, uint, u, 32, 4, vreint_expected_q_p128_u32);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, uint, u, 64, 2, vreint_expected_q_p128_u64);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, poly, p, 8, 16, vreint_expected_q_p128_p8);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, poly, p, 16, 8, vreint_expected_q_p128_p16);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, float, f, 16, 8, vreint_expected_q_p128_f16);
-+ TEST_VREINTERPRET128(q, poly, p, 128, 1, float, f, 32, 4, vreint_expected_q_p128_f32);
++BINOP_TEST (vrsqrts)
++/* { dg-final { scan-assembler-times {vrsqrts\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrsqrts\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+
-+ /* vreinterpretq_*_p128 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VREINTERPRETQ_*_P128"
++TERNOP_TEST (vfma)
++/* { dg-final { scan-assembler-times {vfma\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vfma\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+
-+ /* Since there is no way to load a poly128_t value, load a
-+ poly64x2_t and convert it to poly128_t. This means that we are
-+ not able to test vreinterpretq_*_p128 alone, and that errors in
-+ vreinterpretq_p128_p64 could compensate for errors in
-+ vreinterpretq_*_p128*. */
-+#define TEST_VREINTERPRET_FROM_P128(Q, T1, T2, W, N, TS1, TS2, WS, NS, EXPECTED) \
-+ VECT_VAR(vreint_vector_res, T1, W, N) = \
-+ vreinterpret##Q##_##T2##W##_##TS2##WS( \
-+ vreinterpretq_p128_p64(VECT_VAR(vreint_vector, TS1, 64, 2))); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \
-+ VECT_VAR(vreint_vector_res, T1, W, N)); \
-+ CHECK(TEST_MSG, T1, W, N, PRIx##W, EXPECTED, "");
++TERNOP_TEST (vfms)
++/* { dg-final { scan-assembler-times {vfms\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vfms\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+
-+#define TEST_VREINTERPRET_FP_FROM_P128(Q, T1, T2, W, N, TS1, TS2, WS, NS, EXPECTED) \
-+ VECT_VAR(vreint_vector_res, T1, W, N) = \
-+ vreinterpret##Q##_##T2##W##_##TS2##WS( \
-+ vreinterpretq_p128_p64(VECT_VAR(vreint_vector, TS1, 64, 2))); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \
-+ VECT_VAR(vreint_vector_res, T1, W, N)); \
-+ CHECK_FP(TEST_MSG, T1, W, N, PRIx##W, EXPECTED, "");
++float16x4_t
++test_vmov_n_f16 (float16_t a)
++{
++ return vmov_n_f16 (a);
++}
+
-+ TEST_VREINTERPRET_FROM_P128(q, int, s, 8, 16, poly, p, 128, 1, vreint_expected_q_s8_p128);
-+ TEST_VREINTERPRET_FROM_P128(q, int, s, 16, 8, poly, p, 128, 1, vreint_expected_q_s16_p128);
-+ TEST_VREINTERPRET_FROM_P128(q, int, s, 32, 4, poly, p, 128, 1, vreint_expected_q_s32_p128);
-+ TEST_VREINTERPRET_FROM_P128(q, int, s, 64, 2, poly, p, 128, 1, vreint_expected_q_s64_p128);
-+ TEST_VREINTERPRET_FROM_P128(q, uint, u, 8, 16, poly, p, 128, 1, vreint_expected_q_u8_p128);
-+ TEST_VREINTERPRET_FROM_P128(q, uint, u, 16, 8, poly, p, 128, 1, vreint_expected_q_u16_p128);
-+ TEST_VREINTERPRET_FROM_P128(q, uint, u, 32, 4, poly, p, 128, 1, vreint_expected_q_u32_p128);
-+ TEST_VREINTERPRET_FROM_P128(q, uint, u, 64, 2, poly, p, 128, 1, vreint_expected_q_u64_p128);
-+ TEST_VREINTERPRET_FROM_P128(q, poly, p, 8, 16, poly, p, 128, 1, vreint_expected_q_p8_p128);
-+ TEST_VREINTERPRET_FROM_P128(q, poly, p, 16, 8, poly, p, 128, 1, vreint_expected_q_p16_p128);
-+ TEST_VREINTERPRET_FP_FROM_P128(q, float, f, 16, 8, poly, p, 128, 1, vreint_expected_q_f16_p128);
-+ TEST_VREINTERPRET_FP_FROM_P128(q, float, f, 32, 4, poly, p, 128, 1, vreint_expected_q_f32_p128);
++float16x4_t
++test_vdup_n_f16 (float16_t a)
++{
++ return vdup_n_f16 (a);
++}
++/* { dg-final { scan-assembler-times {vdup\.16\td[0-9]+, r[0-9]+} 2 } } */
+
-+ return 0;
++float16x8_t
++test_vmovq_n_f16 (float16_t a)
++{
++ return vmovq_n_f16 (a);
+}
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c
-@@ -0,0 +1,202 @@
-+/* This file contains tests for the vreinterpret *p64 intrinsics. */
+
-+/* { dg-require-effective-target arm_crypto_ok } */
-+/* { dg-add-options arm_crypto } */
++float16x8_t
++test_vdupq_n_f16 (float16_t a)
++{
++ return vdupq_n_f16 (a);
++}
++/* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, r[0-9]+} 2 } } */
+
-+#include <arm_neon.h>
-+#include "arm-neon-ref.h"
-+#include "compute-ref-data.h"
++float16x4_t
++test_vdup_lane_f16 (float16x4_t a)
++{
++ return vdup_lane_f16 (a, 1);
++}
++/* { dg-final { scan-assembler-times {vdup\.16\td[0-9]+, d[0-9]+\[1\]} 1 } } */
+
-+/* Expected results: vreinterpret_p64_*. */
-+VECT_VAR_DECL(vreint_expected_p64_s8,poly,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
-+VECT_VAR_DECL(vreint_expected_p64_s16,poly,64,1) [] = { 0xfff3fff2fff1fff0 };
-+VECT_VAR_DECL(vreint_expected_p64_s32,poly,64,1) [] = { 0xfffffff1fffffff0 };
-+VECT_VAR_DECL(vreint_expected_p64_s64,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vreint_expected_p64_u8,poly,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
-+VECT_VAR_DECL(vreint_expected_p64_u16,poly,64,1) [] = { 0xfff3fff2fff1fff0 };
-+VECT_VAR_DECL(vreint_expected_p64_u32,poly,64,1) [] = { 0xfffffff1fffffff0 };
-+VECT_VAR_DECL(vreint_expected_p64_u64,poly,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vreint_expected_p64_p8,poly,64,1) [] = { 0xf7f6f5f4f3f2f1f0 };
-+VECT_VAR_DECL(vreint_expected_p64_p16,poly,64,1) [] = { 0xfff3fff2fff1fff0 };
-+VECT_VAR_DECL(vreint_expected_p64_f32,poly,64,1) [] = { 0xc1700000c1800000 };
-+VECT_VAR_DECL(vreint_expected_p64_f16,poly,64,1) [] = { 0xca80cb00cb80cc00 };
++float16x8_t
++test_vdupq_lane_f16 (float16x4_t a)
++{
++ return vdupq_lane_f16 (a, 1);
++}
++/* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, d[0-9]+\[1\]} 1 } } */
+
-+/* Expected results: vreinterpretq_p64_*. */
-+VECT_VAR_DECL(vreint_expected_q_p64_s8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
-+ 0xfffefdfcfbfaf9f8 };
-+VECT_VAR_DECL(vreint_expected_q_p64_s16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
-+ 0xfff7fff6fff5fff4 };
-+VECT_VAR_DECL(vreint_expected_q_p64_s32,poly,64,2) [] = { 0xfffffff1fffffff0,
-+ 0xfffffff3fffffff2 };
-+VECT_VAR_DECL(vreint_expected_q_p64_s64,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vreint_expected_q_p64_u8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
-+ 0xfffefdfcfbfaf9f8 };
-+VECT_VAR_DECL(vreint_expected_q_p64_u16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
-+ 0xfff7fff6fff5fff4 };
-+VECT_VAR_DECL(vreint_expected_q_p64_u32,poly,64,2) [] = { 0xfffffff1fffffff0,
-+ 0xfffffff3fffffff2 };
-+VECT_VAR_DECL(vreint_expected_q_p64_u64,poly,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vreint_expected_q_p64_p8,poly,64,2) [] = { 0xf7f6f5f4f3f2f1f0,
-+ 0xfffefdfcfbfaf9f8 };
-+VECT_VAR_DECL(vreint_expected_q_p64_p16,poly,64,2) [] = { 0xfff3fff2fff1fff0,
-+ 0xfff7fff6fff5fff4 };
-+VECT_VAR_DECL(vreint_expected_q_p64_f32,poly,64,2) [] = { 0xc1700000c1800000,
-+ 0xc1500000c1600000 };
-+VECT_VAR_DECL(vreint_expected_q_p64_f16,poly,64,2) [] = { 0xca80cb00cb80cc00,
-+ 0xc880c900c980ca00 };
++float16x4_t
++test_vext_f16 (float16x4_t a, float16x4_t b)
++{
++ return vext_f16 (a, b, 1);
++}
++/* { dg-final { scan-assembler-times {vext\.16\td[0-9]+, d[0-9]+, d[0-9]+, #1} 1 } } */
+
-+/* Expected results: vreinterpret_*_p64. */
-+VECT_VAR_DECL(vreint_expected_s8_p64,int,8,8) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(vreint_expected_s16_p64,int,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
-+VECT_VAR_DECL(vreint_expected_s32_p64,int,32,2) [] = { 0xfffffff0, 0xffffffff };
-+VECT_VAR_DECL(vreint_expected_s64_p64,int,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vreint_expected_u8_p64,uint,8,8) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(vreint_expected_u16_p64,uint,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
-+VECT_VAR_DECL(vreint_expected_u32_p64,uint,32,2) [] = { 0xfffffff0, 0xffffffff };
-+VECT_VAR_DECL(vreint_expected_u64_p64,uint,64,1) [] = { 0xfffffffffffffff0 };
-+VECT_VAR_DECL(vreint_expected_p8_p64,poly,8,8) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(vreint_expected_p16_p64,poly,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
-+VECT_VAR_DECL(vreint_expected_f32_p64,hfloat,32,2) [] = { 0xfffffff0, 0xffffffff };
-+VECT_VAR_DECL(vreint_expected_f16_p64,hfloat,16,4) [] = { 0xfff0, 0xffff, 0xffff, 0xffff };
++float16x8_t
++test_vextq_f16 (float16x8_t a, float16x8_t b)
++{
++ return vextq_f16 (a, b, 1);
++}
++/* { dg-final { scan-assembler-times {vext\.16\tq[0-9]+, q[0-9]+, q[0-9]+, #1} 1 } } */
+
-+/* Expected results: vreinterpretq_*_p64. */
-+VECT_VAR_DECL(vreint_expected_q_s8_p64,int,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff,
-+ 0xf1, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(vreint_expected_q_s16_p64,int,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
-+VECT_VAR_DECL(vreint_expected_q_s32_p64,int,32,4) [] = { 0xfffffff0, 0xffffffff,
-+ 0xfffffff1, 0xffffffff };
-+VECT_VAR_DECL(vreint_expected_q_s64_p64,int,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vreint_expected_q_u8_p64,uint,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff,
-+ 0xf1, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(vreint_expected_q_u16_p64,uint,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
-+VECT_VAR_DECL(vreint_expected_q_u32_p64,uint,32,4) [] = { 0xfffffff0, 0xffffffff,
-+ 0xfffffff1, 0xffffffff };
-+VECT_VAR_DECL(vreint_expected_q_u64_p64,uint,64,2) [] = { 0xfffffffffffffff0,
-+ 0xfffffffffffffff1 };
-+VECT_VAR_DECL(vreint_expected_q_p8_p64,poly,8,16) [] = { 0xf0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff,
-+ 0xf1, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(vreint_expected_q_p16_p64,poly,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
-+VECT_VAR_DECL(vreint_expected_q_f32_p64,hfloat,32,4) [] = { 0xfffffff0, 0xffffffff,
-+ 0xfffffff1, 0xffffffff };
-+VECT_VAR_DECL(vreint_expected_q_f16_p64,hfloat,16,8) [] = { 0xfff0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xfff1, 0xffff,
-+ 0xffff, 0xffff };
++UNOP_TEST (vrev64)
++/* { dg-final { scan-assembler-times {vrev64\.16\td[0-9]+, d[0-9]+} 1 } }
++ { dg-final { scan-assembler-times {vrev64\.16\tq[0-9]+, q[0-9]+} 1 } } */
++
++float16x4_t
++test_vbsl16x4 (uint16x4_t a, float16x4_t b, float16x4_t c)
++{
++ return vbsl_f16 (a, b, c);
++}
++/* { dg-final { scan-assembler-times {vbsl\td[0-9]+, d[0-9]+, d[0-9]+} 1 } } */
++
++float16x8_t
++test_vbslq16x8 (uint16x8_t a, float16x8_t b, float16x8_t c)
++{
++ return vbslq_f16 (a, b, c);
++}
++/*{ dg-final { scan-assembler-times {vbsl\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+
-+int main (void)
++float16x4x2_t
++test_vzip16x4 (float16x4_t a, float16x4_t b)
+{
-+#define TEST_VREINTERPRET(Q, T1, T2, W, N, TS1, TS2, WS, NS, EXPECTED) \
-+ VECT_VAR(vreint_vector_res, T1, W, N) = \
-+ vreinterpret##Q##_##T2##W##_##TS2##WS(VECT_VAR(vreint_vector, TS1, WS, NS)); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \
-+ VECT_VAR(vreint_vector_res, T1, W, N)); \
-+ CHECK(TEST_MSG, T1, W, N, PRIx##W, EXPECTED, "");
++ return vzip_f16 (a, b);
++}
++/* { dg-final { scan-assembler-times {vzip\.16\td[0-9]+, d[0-9]+} 1 } } */
+
-+#define TEST_VREINTERPRET_FP(Q, T1, T2, W, N, TS1, TS2, WS, NS, EXPECTED) \
-+ VECT_VAR(vreint_vector_res, T1, W, N) = \
-+ vreinterpret##Q##_##T2##W##_##TS2##WS(VECT_VAR(vreint_vector, TS1, WS, NS)); \
-+ vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \
-+ VECT_VAR(vreint_vector_res, T1, W, N)); \
-+ CHECK_FP(TEST_MSG, T1, W, N, PRIx##W, EXPECTED, "");
++float16x8x2_t
++test_vzipq16x8 (float16x8_t a, float16x8_t b)
++{
++ return vzipq_f16 (a, b);
++}
++/*{ dg-final { scan-assembler-times {vzip\.16\tq[0-9]+, q[0-9]+} 1 } } */
+
-+ DECL_VARIABLE_ALL_VARIANTS(vreint_vector);
-+ DECL_VARIABLE(vreint_vector, poly, 64, 1);
-+ DECL_VARIABLE(vreint_vector, poly, 64, 2);
-+ DECL_VARIABLE_ALL_VARIANTS(vreint_vector_res);
-+ DECL_VARIABLE(vreint_vector_res, poly, 64, 1);
-+ DECL_VARIABLE(vreint_vector_res, poly, 64, 2);
++float16x4x2_t
++test_vuzp16x4 (float16x4_t a, float16x4_t b)
++{
++ return vuzp_f16 (a, b);
++}
++/* { dg-final { scan-assembler-times {vuzp\.16\td[0-9]+, d[0-9]+} 1 } } */
+
-+ clean_results ();
++float16x8x2_t
++test_vuzpq16x8 (float16x8_t a, float16x8_t b)
++{
++ return vuzpq_f16 (a, b);
++}
++/*{ dg-final { scan-assembler-times {vuzp\.16\tq[0-9]+, q[0-9]+} 1 } } */
+
-+ TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vreint_vector, buffer);
-+ VLOAD(vreint_vector, buffer, , poly, p, 64, 1);
-+ VLOAD(vreint_vector, buffer, q, poly, p, 64, 2);
-+ VLOAD(vreint_vector, buffer, , float, f, 16, 4);
-+ VLOAD(vreint_vector, buffer, q, float, f, 16, 8);
-+ VLOAD(vreint_vector, buffer, , float, f, 32, 2);
-+ VLOAD(vreint_vector, buffer, q, float, f, 32, 4);
++float16x4x2_t
++test_vtrn16x4 (float16x4_t a, float16x4_t b)
++{
++ return vtrn_f16 (a, b);
++}
++/* { dg-final { scan-assembler-times {vtrn\.16\td[0-9]+, d[0-9]+} 1 } } */
+
-+ /* vreinterpret_p64_* tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VREINTERPRET_P64_*"
-+ TEST_VREINTERPRET(, poly, p, 64, 1, int, s, 8, 8, vreint_expected_p64_s8);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, int, s, 16, 4, vreint_expected_p64_s16);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, int, s, 32, 2, vreint_expected_p64_s32);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, int, s, 64, 1, vreint_expected_p64_s64);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, uint, u, 8, 8, vreint_expected_p64_u8);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, uint, u, 16, 4, vreint_expected_p64_u16);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, uint, u, 32, 2, vreint_expected_p64_u32);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, uint, u, 64, 1, vreint_expected_p64_u64);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, poly, p, 8, 8, vreint_expected_p64_p8);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, poly, p, 16, 4, vreint_expected_p64_p16);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, float, f, 16, 4, vreint_expected_p64_f16);
-+ TEST_VREINTERPRET(, poly, p, 64, 1, float, f, 32, 2, vreint_expected_p64_f32);
++float16x8x2_t
++test_vtrnq16x8 (float16x8_t a, float16x8_t b)
++{
++ return vtrnq_f16 (a, b);
++}
++/*{ dg-final { scan-assembler-times {vtrn\.16\tq[0-9]+, q[0-9]+} 1 } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c
+@@ -0,0 +1,203 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++/* Test instructions generated for the FP16 scalar intrinsics. */
++#include <arm_fp16.h>
++
++#define MSTRCAT(L, str) L##str
++
++#define UNOP_TEST(insn) \
++ float16_t \
++ MSTRCAT (test_##insn, 16) (float16_t a) \
++ { \
++ return MSTRCAT (insn, h_f16) (a); \
++ }
+
-+ /* vreinterpretq_p64_* tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VREINTERPRETQ_P64_*"
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, int, s, 8, 16, vreint_expected_q_p64_s8);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, int, s, 16, 8, vreint_expected_q_p64_s16);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, int, s, 32, 4, vreint_expected_q_p64_s32);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, int, s, 64, 2, vreint_expected_q_p64_s64);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, uint, u, 8, 16, vreint_expected_q_p64_u8);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, uint, u, 16, 8, vreint_expected_q_p64_u16);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, uint, u, 32, 4, vreint_expected_q_p64_u32);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, uint, u, 64, 2, vreint_expected_q_p64_u64);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, poly, p, 8, 16, vreint_expected_q_p64_p8);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, poly, p, 16, 8, vreint_expected_q_p64_p16);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, float, f, 16, 8, vreint_expected_q_p64_f16);
-+ TEST_VREINTERPRET(q, poly, p, 64, 2, float, f, 32, 4, vreint_expected_q_p64_f32);
++#define BINOP_TEST(insn) \
++ float16_t \
++ MSTRCAT (test_##insn, 16) (float16_t a, float16_t b) \
++ { \
++ return MSTRCAT (insn, h_f16) (a, b); \
++ }
+
-+ /* vreinterpret_*_p64 tests. */
-+#undef TEST_MSG
-+#define TEST_MSG "VREINTERPRET_*_P64"
++#define TERNOP_TEST(insn) \
++ float16_t \
++ MSTRCAT (test_##insn, 16) (float16_t a, float16_t b, float16_t c) \
++ { \
++ return MSTRCAT (insn, h_f16) (a, b, c); \
++ }
+
-+ TEST_VREINTERPRET(, int, s, 8, 8, poly, p, 64, 1, vreint_expected_s8_p64);
-+ TEST_VREINTERPRET(, int, s, 16, 4, poly, p, 64, 1, vreint_expected_s16_p64);
-+ TEST_VREINTERPRET(, int, s, 32, 2, poly, p, 64, 1, vreint_expected_s32_p64);
-+ TEST_VREINTERPRET(, int, s, 64, 1, poly, p, 64, 1, vreint_expected_s64_p64);
-+ TEST_VREINTERPRET(, uint, u, 8, 8, poly, p, 64, 1, vreint_expected_u8_p64);
-+ TEST_VREINTERPRET(, uint, u, 16, 4, poly, p, 64, 1, vreint_expected_u16_p64);
-+ TEST_VREINTERPRET(, uint, u, 32, 2, poly, p, 64, 1, vreint_expected_u32_p64);
-+ TEST_VREINTERPRET(, uint, u, 64, 1, poly, p, 64, 1, vreint_expected_u64_p64);
-+ TEST_VREINTERPRET(, poly, p, 8, 8, poly, p, 64, 1, vreint_expected_p8_p64);
-+ TEST_VREINTERPRET(, poly, p, 16, 4, poly, p, 64, 1, vreint_expected_p16_p64);
-+ TEST_VREINTERPRET_FP(, float, f, 16, 4, poly, p, 64, 1, vreint_expected_f16_p64);
-+ TEST_VREINTERPRET_FP(, float, f, 32, 2, poly, p, 64, 1, vreint_expected_f32_p64);
-+ TEST_VREINTERPRET(q, int, s, 8, 16, poly, p, 64, 2, vreint_expected_q_s8_p64);
-+ TEST_VREINTERPRET(q, int, s, 16, 8, poly, p, 64, 2, vreint_expected_q_s16_p64);
-+ TEST_VREINTERPRET(q, int, s, 32, 4, poly, p, 64, 2, vreint_expected_q_s32_p64);
-+ TEST_VREINTERPRET(q, int, s, 64, 2, poly, p, 64, 2, vreint_expected_q_s64_p64);
-+ TEST_VREINTERPRET(q, uint, u, 8, 16, poly, p, 64, 2, vreint_expected_q_u8_p64);
-+ TEST_VREINTERPRET(q, uint, u, 16, 8, poly, p, 64, 2, vreint_expected_q_u16_p64);
-+ TEST_VREINTERPRET(q, uint, u, 32, 4, poly, p, 64, 2, vreint_expected_q_u32_p64);
-+ TEST_VREINTERPRET(q, uint, u, 64, 2, poly, p, 64, 2, vreint_expected_q_u64_p64);
-+ TEST_VREINTERPRET(q, poly, p, 8, 16, poly, p, 64, 2, vreint_expected_q_p8_p64);
-+ TEST_VREINTERPRET(q, poly, p, 16, 8, poly, p, 64, 2, vreint_expected_q_p16_p64);
-+ TEST_VREINTERPRET_FP(q, float, f, 16, 8, poly, p, 64, 2, vreint_expected_q_f16_p64);
-+ TEST_VREINTERPRET_FP(q, float, f, 32, 4, poly, p, 64, 2, vreint_expected_q_f32_p64);
++float16_t
++test_vcvth_f16_s32 (int32_t a)
++{
++ return vcvth_f16_s32 (a);
++}
+
-+ return 0;
++float16_t
++test_vcvth_n_f16_s32 (int32_t a)
++{
++ return vcvth_n_f16_s32 (a, 1);
+}
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
-@@ -0,0 +1,16 @@
-+/* { dg-require-effective-target arm_v8_neon_ok } */
-+/* { dg-add-options arm_v8_neon } */
++/* { dg-final { scan-assembler-times {vcvt\.f16\.s32\ts[0-9]+, s[0-9]+} 2 } } */
++/* { dg-final { scan-assembler-times {vcvt\.f16\.s32\ts[0-9]+, s[0-9]+, #1} 1 } } */
+
-+#include <arm_neon.h>
-+#include "arm-neon-ref.h"
-+#include "compute-ref-data.h"
++float16_t
++test_vcvth_f16_u32 (uint32_t a)
++{
++ return vcvth_f16_u32 (a);
++}
+
-+/* Expected results. */
-+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
-+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
-+ 0xc1600000, 0xc1500000 };
++float16_t
++test_vcvth_n_f16_u32 (uint32_t a)
++{
++ return vcvth_n_f16_u32 (a, 1);
++}
+
-+#define INSN vrnd
-+#define TEST_MSG "VRND"
++/* { dg-final { scan-assembler-times {vcvt\.f16\.u32\ts[0-9]+, s[0-9]+} 2 } } */
++/* { dg-final { scan-assembler-times {vcvt\.f16\.u32\ts[0-9]+, s[0-9]+, #1} 1 } } */
+
-+#include "vrndX.inc"
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndX.inc
-@@ -0,0 +1,43 @@
-+#define FNNAME1(NAME) exec_ ## NAME
-+#define FNNAME(NAME) FNNAME1 (NAME)
++uint32_t
++test_vcvth_u32_f16 (float16_t a)
++{
++ return vcvth_u32_f16 (a);
++}
++/* { dg-final { scan-assembler-times {vcvt\.u32\.f16\ts[0-9]+, s[0-9]+} 2 } } */
+
-+void FNNAME (INSN) (void)
++uint32_t
++test_vcvth_n_u32_f16 (float16_t a)
+{
-+ /* vector_res = vrndX (vector), then store the result. */
-+#define TEST_VRND2(INSN, Q, T1, T2, W, N) \
-+ VECT_VAR (vector_res, T1, W, N) = \
-+ INSN##Q##_##T2##W (VECT_VAR (vector, T1, W, N)); \
-+ vst1##Q##_##T2##W (VECT_VAR (result, T1, W, N), \
-+ VECT_VAR (vector_res, T1, W, N))
++ return vcvth_n_u32_f16 (a, 1);
++}
++/* { dg-final { scan-assembler-times {vcvt\.u32\.f16\ts[0-9]+, s[0-9]+, #1} 1 } } */
+
-+ /* Two auxliary macros are necessary to expand INSN. */
-+#define TEST_VRND1(INSN, Q, T1, T2, W, N) \
-+ TEST_VRND2 (INSN, Q, T1, T2, W, N)
++int32_t
++test_vcvth_s32_f16 (float16_t a)
++{
++ return vcvth_s32_f16 (a);
++}
+
-+#define TEST_VRND(Q, T1, T2, W, N) \
-+ TEST_VRND1 (INSN, Q, T1, T2, W, N)
++int32_t
++test_vcvth_n_s32_f16 (float16_t a)
++{
++ return vcvth_n_s32_f16 (a, 1);
++}
+
-+ DECL_VARIABLE (vector, float, 32, 2);
-+ DECL_VARIABLE (vector, float, 32, 4);
++/* { dg-final { scan-assembler-times {vcvt\.s32\.f16\ts[0-9]+, s[0-9]+} 2 } } */
++/* { dg-final { scan-assembler-times {vcvt\.s32\.f16\ts[0-9]+, s[0-9]+, #1} 1 } } */
+
-+ DECL_VARIABLE (vector_res, float, 32, 2);
-+ DECL_VARIABLE (vector_res, float, 32, 4);
++int32_t
++test_vcvtah_s32_f16 (float16_t a)
++{
++ return vcvtah_s32_f16 (a);
++}
++/* { dg-final { scan-assembler-times {vcvta\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+ clean_results ();
++uint32_t
++test_vcvtah_u32_f16 (float16_t a)
++{
++ return vcvtah_u32_f16 (a);
++}
++/* { dg-final { scan-assembler-times {vcvta\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+ VLOAD (vector, buffer, , float, f, 32, 2);
-+ VLOAD (vector, buffer, q, float, f, 32, 4);
++int32_t
++test_vcvtmh_s32_f16 (float16_t a)
++{
++ return vcvtmh_s32_f16 (a);
++}
++/* { dg-final { scan-assembler-times {vcvtm\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+ TEST_VRND ( , float, f, 32, 2);
-+ TEST_VRND (q, float, f, 32, 4);
++uint32_t
++test_vcvtmh_u32_f16 (float16_t a)
++{
++ return vcvtmh_u32_f16 (a);
++}
++/* { dg-final { scan-assembler-times {vcvtm\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }
++ */
+
-+ CHECK_FP (TEST_MSG, float, 32, 2, PRIx32, expected, "");
-+ CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
++int32_t
++test_vcvtnh_s32_f16 (float16_t a)
++{
++ return vcvtnh_s32_f16 (a);
+}
++/* { dg-final { scan-assembler-times {vcvtn\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } }
++ */
+
-+int
-+main (void)
++uint32_t
++test_vcvtnh_u32_f16 (float16_t a)
+{
-+ FNNAME (INSN) ();
-+ return 0;
++ return vcvtnh_u32_f16 (a);
+}
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
-@@ -0,0 +1,16 @@
-+/* { dg-require-effective-target arm_v8_neon_ok } */
-+/* { dg-add-options arm_v8_neon } */
++/* { dg-final { scan-assembler-times {vcvtn\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }
++ */
+
-+#include <arm_neon.h>
-+#include "arm-neon-ref.h"
-+#include "compute-ref-data.h"
++int32_t
++test_vcvtph_s32_f16 (float16_t a)
++{
++ return vcvtph_s32_f16 (a);
++}
++/* { dg-final { scan-assembler-times {vcvtp\.s32\.f16\ts[0-9]+, s[0-9]+} 1 } }
++ */
+
-+/* Expected results. */
-+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
-+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
-+ 0xc1600000, 0xc1500000 };
++uint32_t
++test_vcvtph_u32_f16 (float16_t a)
++{
++ return vcvtph_u32_f16 (a);
++}
++/* { dg-final { scan-assembler-times {vcvtp\.u32\.f16\ts[0-9]+, s[0-9]+} 1 } }
++ */
+
-+#define INSN vrnda
-+#define TEST_MSG "VRNDA"
++UNOP_TEST (vabs)
++/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+#include "vrndX.inc"
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
-@@ -0,0 +1,16 @@
-+/* { dg-require-effective-target arm_v8_neon_ok } */
-+/* { dg-add-options arm_v8_neon } */
++UNOP_TEST (vneg)
++/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+#include <arm_neon.h>
-+#include "arm-neon-ref.h"
-+#include "compute-ref-data.h"
++UNOP_TEST (vrnd)
++/* { dg-final { scan-assembler-times {vrintz\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+/* Expected results. */
-+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
-+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
-+ 0xc1600000, 0xc1500000 };
++UNOP_TEST (vrndi)
++/* { dg-final { scan-assembler-times {vrintr\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+#define INSN vrndm
-+#define TEST_MSG "VRNDM"
++UNOP_TEST (vrnda)
++/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+#include "vrndX.inc"
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
-@@ -0,0 +1,16 @@
-+/* { dg-require-effective-target arm_v8_neon_ok } */
-+/* { dg-add-options arm_v8_neon } */
++UNOP_TEST (vrndm)
++/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+#include <arm_neon.h>
-+#include "arm-neon-ref.h"
-+#include "compute-ref-data.h"
++UNOP_TEST (vrndn)
++/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+/* Expected results. */
-+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
-+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
-+ 0xc1600000, 0xc1500000 };
++UNOP_TEST (vrndp)
++/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+#define INSN vrndn
-+#define TEST_MSG "VRNDN"
++UNOP_TEST (vrndx)
++/* { dg-final { scan-assembler-times {vrinta\.f16\ts[0-9]+, s[0-9]+} 1 } } */
+
-+#include "vrndX.inc"
++UNOP_TEST (vsqrt)
++/* { dg-final { scan-assembler-times {vsqrt\.f16\ts[0-9]+, s[0-9]+} 1 } } */
++
++BINOP_TEST (vadd)
++/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
++BINOP_TEST (vdiv)
++/* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
++BINOP_TEST (vmaxnm)
++/* { dg-final { scan-assembler-times {vmaxnm\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
++BINOP_TEST (vminnm)
++/* { dg-final { scan-assembler-times {vminnm\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
++BINOP_TEST (vmul)
++/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
++BINOP_TEST (vsub)
++/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
++TERNOP_TEST (vfma)
++/* { dg-final { scan-assembler-times {vfma\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
++TERNOP_TEST (vfms)
++/* { dg-final { scan-assembler-times {vfms\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c
-@@ -0,0 +1,16 @@
-+/* { dg-require-effective-target arm_v8_neon_ok } */
-+/* { dg-add-options arm_v8_neon } */
++++ b/src/gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c
+@@ -0,0 +1,71 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
++/* { dg-options "-O2 -std=c11" } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++/* Test compiler use of FP16 instructions. */
++#include <arm_fp16.h>
++
++float16_t
++test_mov_imm_1 (float16_t a)
++{
++ return 1.0;
++}
++
++float16_t
++test_mov_imm_2 (float16_t a)
++{
++ float16_t b = 1.0;
++ return b;
++}
++
++float16_t
++test_vmov_imm_3 (float16_t a)
++{
++ float16_t b = 1.0;
++ return vaddh_f16 (a, b);
++}
++
++float16_t
++test_vmov_imm_4 (float16_t a)
++{
++ return vaddh_f16 (a, 1.0);
++}
++
++/* { dg-final { scan-assembler-times {vmov.f16\ts[0-9]+, #1\.0e\+0} 4 } }
++ { dg-final { scan-assembler-times {vadd.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 2 } } */
++
++float16_t
++test_vmla_1 (float16_t a, float16_t b, float16_t c)
++{
++ return vaddh_f16 (vmulh_f16 (a, b), c);
++}
++/* { dg-final { scan-assembler-times {vmla\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
++float16_t
++test_vmla_2 (float16_t a, float16_t b, float16_t c)
++{
++ return vsubh_f16 (vmulh_f16 (vnegh_f16 (a), b), c);
++}
++/* { dg-final { scan-assembler-times {vnmla\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
++float16_t
++test_vmls_1 (float16_t a, float16_t b, float16_t c)
++{
++ return vsubh_f16 (c, vmulh_f16 (a, b));
++}
++
++float16_t
++test_vmls_2 (float16_t a, float16_t b, float16_t c)
++{
++ return vsubh_f16 (a, vmulh_f16 (b, c));
++}
++/* { dg-final { scan-assembler-times {vmls\.f16} 2 } } */
++
++float16_t
++test_vnmls_1 (float16_t a, float16_t b, float16_t c)
++{
++ return vsubh_f16 (vmulh_f16 (a, b), c);
++}
++/* { dg-final { scan-assembler-times {vnmls\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */
++
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire-2.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v8m_main_ok } */
++/* { dg-options "-O2 -fno-ipa-icf" } */
++/* { dg-add-options arm_arch_v8m_main } */
+
-+#include <arm_neon.h>
-+#include "arm-neon-ref.h"
-+#include "compute-ref-data.h"
++#include "../aarch64/atomic-comp-swap-release-acquire.x"
+
-+/* Expected results. */
-+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
-+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
-+ 0xc1600000, 0xc1500000 };
++/* { dg-final { scan-assembler-times "ldaex" 4 } } */
++/* { dg-final { scan-assembler-times "stlex" 4 } } */
++/* { dg-final { scan-assembler-not "dmb" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/atomic-op-acq_rel-2.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v8m_main_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_arch_v8m_main } */
+
-+#define INSN vrndp
-+#define TEST_MSG "VRNDP"
++#include "../aarch64/atomic-op-acq_rel.x"
+
-+#include "vrndX.inc"
++/* { dg-final { scan-assembler-times "ldaex\tr\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-times "stlex\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-not "dmb" } } */
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c
-@@ -0,0 +1,16 @@
-+/* { dg-require-effective-target arm_v8_neon_ok } */
-+/* { dg-add-options arm_v8_neon } */
++++ b/src/gcc/testsuite/gcc.target/arm/atomic-op-acquire-2.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v8m_main_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_arch_v8m_main } */
+
-+#include <arm_neon.h>
-+#include "arm-neon-ref.h"
-+#include "compute-ref-data.h"
++#include "../aarch64/atomic-op-acquire.x"
+
-+/* Expected results. */
-+VECT_VAR_DECL (expected, hfloat, 32, 2) [] = { 0xc1800000, 0xc1700000 };
-+VECT_VAR_DECL (expected, hfloat, 32, 4) [] = { 0xc1800000, 0xc1700000,
-+ 0xc1600000, 0xc1500000 };
++/* { dg-final { scan-assembler-times "ldaex\tr\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-times "strex\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-not "dmb" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/atomic-op-char-2.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v8m_main_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_arch_v8m_main } */
+
-+#define INSN vrndx
-+#define TEST_MSG "VRNDX"
++#include "../aarch64/atomic-op-char.x"
+
-+#include "vrndX.inc"
---- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshl.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshl.c
-@@ -101,10 +101,8 @@ VECT_VAR_DECL(expected_negative_shift,uint,64,2) [] = { 0x7ffffffffffffff,
- 0x7ffffffffffffff };
-
-
--#ifndef INSN_NAME
- #define INSN_NAME vshl
- #define TEST_MSG "VSHL/VSHLQ"
--#endif
-
- #define FNNAME1(NAME) exec_ ## NAME
- #define FNNAME(NAME) FNNAME1(NAME)
---- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vsli_n.c
-@@ -166,9 +166,11 @@ void vsli_extra(void)
- CHECK(TEST_MSG, int, 8, 16, PRIx8, expected_max_shift, COMMENT);
- CHECK(TEST_MSG, int, 16, 8, PRIx16, expected_max_shift, COMMENT);
- CHECK(TEST_MSG, int, 32, 4, PRIx32, expected_max_shift, COMMENT);
-+ CHECK(TEST_MSG, int, 64, 2, PRIx64, expected_max_shift, COMMENT);
- CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_max_shift, COMMENT);
- CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_max_shift, COMMENT);
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_max_shift, COMMENT);
-+ CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected_max_shift, COMMENT);
- CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_max_shift, COMMENT);
- CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_max_shift, COMMENT);
- }
---- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c
-@@ -14,6 +14,7 @@ VECT_VAR_DECL(expected_st2_0,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
- VECT_VAR_DECL(expected_st2_0,poly,8,8) [] = { 0xf0, 0xf1, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st2_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st2_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st2_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
- VECT_VAR_DECL(expected_st2_0,int,16,8) [] = { 0xfff0, 0xfff1, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-@@ -24,6 +25,8 @@ VECT_VAR_DECL(expected_st2_0,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
- 0x0, 0x0 };
- VECT_VAR_DECL(expected_st2_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st2_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0x0, 0x0,
-+ 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st2_0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
- 0x0, 0x0 };
-
-@@ -39,6 +42,7 @@ VECT_VAR_DECL(expected_st2_1,uint,32,2) [] = { 0x0, 0x0 };
- VECT_VAR_DECL(expected_st2_1,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st2_1,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st2_1,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st2_1,hfloat,32,2) [] = { 0x0, 0x0 };
- VECT_VAR_DECL(expected_st2_1,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-@@ -48,6 +52,8 @@ VECT_VAR_DECL(expected_st2_1,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- VECT_VAR_DECL(expected_st2_1,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st2_1,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st2_1,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
-+ 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st2_1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-
- /* Expected results for vst3, chunk 0. */
-@@ -62,6 +68,7 @@ VECT_VAR_DECL(expected_st3_0,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
- VECT_VAR_DECL(expected_st3_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0 };
-+VECT_VAR_DECL(expected_st3_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0x0 };
- VECT_VAR_DECL(expected_st3_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
- VECT_VAR_DECL(expected_st3_0,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-@@ -73,6 +80,8 @@ VECT_VAR_DECL(expected_st3_0,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
- 0xfffffff2, 0x0 };
- VECT_VAR_DECL(expected_st3_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st3_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0x0,
-+ 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
- 0xc1600000, 0x0 };
-
-@@ -88,6 +97,7 @@ VECT_VAR_DECL(expected_st3_1,uint,32,2) [] = { 0xfffffff2, 0x0 };
- VECT_VAR_DECL(expected_st3_1,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_1,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st3_1,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_1,hfloat,32,2) [] = { 0xc1600000, 0x0 };
- VECT_VAR_DECL(expected_st3_1,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-@@ -97,6 +107,8 @@ VECT_VAR_DECL(expected_st3_1,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- VECT_VAR_DECL(expected_st3_1,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_1,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st3_1,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
-+ 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-
- /* Expected results for vst3, chunk 2. */
-@@ -111,6 +123,7 @@ VECT_VAR_DECL(expected_st3_2,uint,32,2) [] = { 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_2,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_2,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st3_2,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_2,hfloat,32,2) [] = { 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_2,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-@@ -120,6 +133,8 @@ VECT_VAR_DECL(expected_st3_2,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- VECT_VAR_DECL(expected_st3_2,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_2,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st3_2,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
-+ 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st3_2,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-
- /* Expected results for vst4, chunk 0. */
-@@ -134,6 +149,7 @@ VECT_VAR_DECL(expected_st4_0,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
- VECT_VAR_DECL(expected_st4_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
- 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
-+VECT_VAR_DECL(expected_st4_0,hfloat,16,4) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80 };
- VECT_VAR_DECL(expected_st4_0,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
- VECT_VAR_DECL(expected_st4_0,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
- 0x0, 0x0, 0x0, 0x0 };
-@@ -145,6 +161,8 @@ VECT_VAR_DECL(expected_st4_0,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
- 0xfffffff2, 0xfffffff3 };
- VECT_VAR_DECL(expected_st4_0,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
- 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st4_0,hfloat,16,8) [] = { 0xcc00, 0xcb80, 0xcb00, 0xca80,
-+ 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_0,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
- 0xc1600000, 0xc1500000 };
-
-@@ -160,6 +178,7 @@ VECT_VAR_DECL(expected_st4_1,uint,32,2) [] = { 0xfffffff2, 0xfffffff3 };
- VECT_VAR_DECL(expected_st4_1,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_1,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st4_1,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_1,hfloat,32,2) [] = { 0xc1600000, 0xc1500000 };
- VECT_VAR_DECL(expected_st4_1,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-@@ -169,6 +188,8 @@ VECT_VAR_DECL(expected_st4_1,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- VECT_VAR_DECL(expected_st4_1,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_1,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st4_1,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
-+ 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-
- /* Expected results for vst4, chunk 2. */
-@@ -183,6 +204,7 @@ VECT_VAR_DECL(expected_st4_2,uint,32,2) [] = { 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_2,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_2,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st4_2,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_2,hfloat,32,2) [] = { 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_2,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-@@ -192,6 +214,8 @@ VECT_VAR_DECL(expected_st4_2,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- VECT_VAR_DECL(expected_st4_2,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_2,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st4_2,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
-+ 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_2,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-
- /* Expected results for vst4, chunk 3. */
-@@ -206,6 +230,7 @@ VECT_VAR_DECL(expected_st4_3,uint,32,2) [] = { 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_3,poly,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_3,poly,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st4_3,hfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_3,hfloat,32,2) [] = { 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_3,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-@@ -215,6 +240,8 @@ VECT_VAR_DECL(expected_st4_3,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- VECT_VAR_DECL(expected_st4_3,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_3,poly,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
- 0x0, 0x0, 0x0, 0x0 };
-+VECT_VAR_DECL(expected_st4_3,hfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
-+ 0x0, 0x0, 0x0, 0x0 };
- VECT_VAR_DECL(expected_st4_3,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
-
- /* Declare additional input buffers as needed. */
-@@ -229,6 +256,7 @@ VECT_VAR_DECL_INIT(buffer_vld2_lane, uint, 32, 2);
- VECT_VAR_DECL_INIT(buffer_vld2_lane, uint, 64, 2);
- VECT_VAR_DECL_INIT(buffer_vld2_lane, poly, 8, 2);
- VECT_VAR_DECL_INIT(buffer_vld2_lane, poly, 16, 2);
-+VECT_VAR_DECL_INIT(buffer_vld2_lane, float, 16, 2);
- VECT_VAR_DECL_INIT(buffer_vld2_lane, float, 32, 2);
-
- /* Input buffers for vld3_lane. */
-@@ -242,6 +270,7 @@ VECT_VAR_DECL_INIT(buffer_vld3_lane, uint, 32, 3);
- VECT_VAR_DECL_INIT(buffer_vld3_lane, uint, 64, 3);
- VECT_VAR_DECL_INIT(buffer_vld3_lane, poly, 8, 3);
- VECT_VAR_DECL_INIT(buffer_vld3_lane, poly, 16, 3);
-+VECT_VAR_DECL_INIT(buffer_vld3_lane, float, 16, 3);
- VECT_VAR_DECL_INIT(buffer_vld3_lane, float, 32, 3);
-
- /* Input buffers for vld4_lane. */
-@@ -255,6 +284,7 @@ VECT_VAR_DECL_INIT(buffer_vld4_lane, uint, 32, 4);
- VECT_VAR_DECL_INIT(buffer_vld4_lane, uint, 64, 4);
- VECT_VAR_DECL_INIT(buffer_vld4_lane, poly, 8, 4);
- VECT_VAR_DECL_INIT(buffer_vld4_lane, poly, 16, 4);
-+VECT_VAR_DECL_INIT(buffer_vld4_lane, float, 16, 4);
- VECT_VAR_DECL_INIT(buffer_vld4_lane, float, 32, 4);
-
- void exec_vstX_lane (void)
-@@ -302,7 +332,7 @@ void exec_vstX_lane (void)
-
- /* We need all variants in 64 bits, but there is no 64x2 variant,
- nor 128 bits vectors of int8/uint8/poly8. */
--#define DECL_ALL_VSTX_LANE(X) \
-+#define DECL_ALL_VSTX_LANE_NO_FP16(X) \
- DECL_VSTX_LANE(int, 8, 8, X); \
- DECL_VSTX_LANE(int, 16, 4, X); \
- DECL_VSTX_LANE(int, 32, 2, X); \
-@@ -319,11 +349,20 @@ void exec_vstX_lane (void)
- DECL_VSTX_LANE(poly, 16, 8, X); \
- DECL_VSTX_LANE(float, 32, 4, X)
-
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+#define DECL_ALL_VSTX_LANE(X) \
-+ DECL_ALL_VSTX_LANE_NO_FP16(X); \
-+ DECL_VSTX_LANE(float, 16, 4, X); \
-+ DECL_VSTX_LANE(float, 16, 8, X)
-+#else
-+#define DECL_ALL_VSTX_LANE(X) DECL_ALL_VSTX_LANE_NO_FP16(X)
-+#endif
++/* { dg-final { scan-assembler-times "ldrexb\tr\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-times "strexb\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-not "dmb" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/atomic-op-consume-2.c
+@@ -0,0 +1,11 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v8m_main_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_arch_v8m_main } */
+
- #define DUMMY_ARRAY(V, T, W, N, L) VECT_VAR_DECL(V,T,W,N)[N*L]
-
- /* Use the same lanes regardless of the size of the array (X), for
- simplicity. */
--#define TEST_ALL_VSTX_LANE(X) \
-+#define TEST_ALL_VSTX_LANE_NO_FP16(X) \
- TEST_VSTX_LANE(, int, s, 8, 8, X, 7); \
- TEST_VSTX_LANE(, int, s, 16, 4, X, 2); \
- TEST_VSTX_LANE(, int, s, 32, 2, X, 0); \
-@@ -340,7 +379,16 @@ void exec_vstX_lane (void)
- TEST_VSTX_LANE(q, poly, p, 16, 8, X, 5); \
- TEST_VSTX_LANE(q, float, f, 32, 4, X, 2)
-
--#define TEST_ALL_EXTRA_CHUNKS(X, Y) \
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+#define TEST_ALL_VSTX_LANE(X) \
-+ TEST_ALL_VSTX_LANE_NO_FP16(X); \
-+ TEST_VSTX_LANE(, float, f, 16, 4, X, 2); \
-+ TEST_VSTX_LANE(q, float, f, 16, 8, X, 6)
-+#else
-+#define TEST_ALL_VSTX_LANE(X) TEST_ALL_VSTX_LANE_NO_FP16(X)
-+#endif
++#include "../aarch64/atomic-op-consume.x"
+
-+#define TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y) \
- TEST_EXTRA_CHUNK(int, 8, 8, X, Y); \
- TEST_EXTRA_CHUNK(int, 16, 4, X, Y); \
- TEST_EXTRA_CHUNK(int, 32, 2, X, Y); \
-@@ -357,6 +405,15 @@ void exec_vstX_lane (void)
- TEST_EXTRA_CHUNK(poly, 16, 8, X, Y); \
- TEST_EXTRA_CHUNK(float, 32, 4, X, Y)
-
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+#define TEST_ALL_EXTRA_CHUNKS(X,Y) \
-+ TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y); \
-+ TEST_EXTRA_CHUNK(float, 16, 4, X, Y); \
-+ TEST_EXTRA_CHUNK(float, 16, 8, X, Y)
-+#else
-+#define TEST_ALL_EXTRA_CHUNKS(X,Y) TEST_ALL_EXTRA_CHUNKS_NO_FP16(X, Y)
-+#endif
++/* Scan for ldaex is a PR59448 consume workaround. */
++/* { dg-final { scan-assembler-times "ldaex\tr\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-times "strex\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-not "dmb" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/atomic-op-int-2.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v8m_main_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_arch_v8m_main } */
+
- /* Declare the temporary buffers / variables. */
- DECL_ALL_VSTX_LANE(2);
- DECL_ALL_VSTX_LANE(3);
-@@ -371,12 +428,18 @@ void exec_vstX_lane (void)
- DUMMY_ARRAY(buffer_src, uint, 32, 2, 4);
- DUMMY_ARRAY(buffer_src, poly, 8, 8, 4);
- DUMMY_ARRAY(buffer_src, poly, 16, 4, 4);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ DUMMY_ARRAY(buffer_src, float, 16, 4, 4);
-+#endif
- DUMMY_ARRAY(buffer_src, float, 32, 2, 4);
- DUMMY_ARRAY(buffer_src, int, 16, 8, 4);
- DUMMY_ARRAY(buffer_src, int, 32, 4, 4);
- DUMMY_ARRAY(buffer_src, uint, 16, 8, 4);
- DUMMY_ARRAY(buffer_src, uint, 32, 4, 4);
- DUMMY_ARRAY(buffer_src, poly, 16, 8, 4);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ DUMMY_ARRAY(buffer_src, float, 16, 8, 4);
-+#endif
- DUMMY_ARRAY(buffer_src, float, 32, 4, 4);
-
- /* Check vst2_lane/vst2q_lane. */
-@@ -400,6 +463,10 @@ void exec_vstX_lane (void)
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st2_0, CMT);
- CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st2_0, CMT);
- CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st2_0, CMT);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st2_0, CMT);
-+ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st2_0, CMT);
++#include "../aarch64/atomic-op-int.x"
++
++/* { dg-final { scan-assembler-times "ldrex\tr\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-times "strex\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-not "dmb" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/atomic-op-relaxed-2.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v8m_main_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_arch_v8m_main } */
++
++#include "../aarch64/atomic-op-relaxed.x"
++
++/* { dg-final { scan-assembler-times "ldrex\tr\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-times "strex\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-not "dmb" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/atomic-op-release-2.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v8m_main_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_arch_v8m_main } */
++
++#include "../aarch64/atomic-op-release.x"
++
++/* { dg-final { scan-assembler-times "ldrex\tr\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-times "stlex\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-not "dmb" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/atomic-op-seq_cst-2.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v8m_main_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_arch_v8m_main } */
++
++#include "../aarch64/atomic-op-seq_cst.x"
++
++/* { dg-final { scan-assembler-times "ldaex\tr\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-times "stlex\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-not "dmb" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/atomic-op-short-2.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v8m_main_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_arch_v8m_main } */
++
++#include "../aarch64/atomic-op-short.x"
++
++/* { dg-final { scan-assembler-times "ldrexh\tr\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-times "strexh\t...?, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 6 } } */
++/* { dg-final { scan-assembler-not "dmb" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
+@@ -0,0 +1,58 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_v8_2a_fp16_scalar } */
++
++/* Reset fpu to a value compatible with the next pragmas. */
++#pragma GCC target ("fpu=vfp")
++
++#pragma GCC push_options
++#pragma GCC target ("fpu=fp-armv8")
++
++#ifndef __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
++#error __ARM_FEATURE_FP16_SCALAR_ARITHMETIC not defined.
+#endif
-
- TEST_ALL_EXTRA_CHUNKS(2, 1);
- #undef CMT
-@@ -419,6 +486,10 @@ void exec_vstX_lane (void)
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st2_1, CMT);
- CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st2_1, CMT);
- CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st2_1, CMT);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st2_1, CMT);
-+ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st2_1, CMT);
++
++#pragma GCC push_options
++#pragma GCC target ("fpu=neon-fp-armv8")
++
++#ifndef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
++#error __ARM_FEATURE_FP16_VECTOR_ARITHMETIC not defined.
+#endif
-
-
- /* Check vst3_lane/vst3q_lane. */
-@@ -444,6 +515,10 @@ void exec_vstX_lane (void)
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st3_0, CMT);
- CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st3_0, CMT);
- CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st3_0, CMT);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st3_0, CMT);
-+ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st3_0, CMT);
++
++#ifndef __ARM_NEON
++#error __ARM_NEON not defined.
+#endif
-
- TEST_ALL_EXTRA_CHUNKS(3, 1);
-
-@@ -464,6 +539,10 @@ void exec_vstX_lane (void)
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st3_1, CMT);
- CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st3_1, CMT);
- CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st3_1, CMT);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st3_1, CMT);
-+ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st3_1, CMT);
++
++#if !defined (__ARM_FP) || !(__ARM_FP & 0x2)
++#error Invalid value for __ARM_FP
+#endif
-
- TEST_ALL_EXTRA_CHUNKS(3, 2);
-
-@@ -484,6 +563,10 @@ void exec_vstX_lane (void)
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st3_2, CMT);
- CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st3_2, CMT);
- CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st3_2, CMT);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st3_2, CMT);
-+ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st3_2, CMT);
++
++#include "arm_neon.h"
++
++float16_t
++foo (float16x4_t b)
++{
++ float16x4_t a = {2.0, 3.0, 4.0, 5.0};
++ float16x4_t res = vadd_f16 (a, b);
++
++ return res[0];
++}
++
++/* { dg-final { scan-assembler "vadd\\.f16\td\[0-9\]+, d\[0-9\]+" } } */
++
++#pragma GCC pop_options
++
++/* Check that the FP version is correctly reset to mfpu=fp-armv8. */
++
++#if !defined (__ARM_FP) || !(__ARM_FP & 0x2)
++#error __ARM_FP should record FP16 support.
+#endif
-
-
- /* Check vst4_lane/vst4q_lane. */
-@@ -509,6 +592,10 @@ void exec_vstX_lane (void)
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st4_0, CMT);
- CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st4_0, CMT);
- CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st4_0, CMT);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st4_0, CMT);
-+ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st4_0, CMT);
++
++#pragma GCC pop_options
++
++/* Check that the FP version is correctly reset to mfpu=vfp. */
++
++#if !defined (__ARM_FP) || (__ARM_FP & 0x2)
++#error Unexpected value for __ARM_FP.
+#endif
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/builtin_saddl.c
+@@ -0,0 +1,17 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++/* { dg-require-effective-target arm32 } */
++extern void overflow_handler ();
++
++long overflow_add (long x, long y)
++{
++ long r;
++
++ int ovr = __builtin_saddl_overflow (x, y, &r);
++ if (ovr)
++ overflow_handler ();
++
++ return r;
++}
++
++/* { dg-final { scan-assembler "adds" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/builtin_saddll.c
+@@ -0,0 +1,18 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++/* { dg-require-effective-target arm32 } */
++extern void overflow_handler ();
++
++long long overflow_add (long long x, long long y)
++{
++ long long r;
++
++ int ovr = __builtin_saddll_overflow (x, y, &r);
++ if (ovr)
++ overflow_handler ();
++
++ return r;
++}
++
++/* { dg-final { scan-assembler "adds" } } */
++/* { dg-final { scan-assembler "adcs" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/builtin_ssubl.c
+@@ -0,0 +1,17 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++/* { dg-require-effective-target arm32 } */
++extern void overflow_handler ();
++
++long overflow_sub (long x, long y)
++{
++ long r;
++
++ int ovr = __builtin_ssubl_overflow (x, y, &r);
++ if (ovr)
++ overflow_handler ();
++
++ return r;
++}
++
++/* { dg-final { scan-assembler "subs" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/builtin_ssubll.c
+@@ -0,0 +1,18 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++/* { dg-require-effective-target arm32 } */
++extern void overflow_handler ();
++
++long long overflow_sub (long long x, long long y)
++{
++ long long r;
++
++ int ovr = __builtin_ssubll_overflow (x, y, &r);
++ if (ovr)
++ overflow_handler ();
++
++ return r;
++}
++
++/* { dg-final { scan-assembler "subs" } } */
++/* { dg-final { scan-assembler "sbcs" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/builtin_uaddl.c
+@@ -0,0 +1,17 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++/* { dg-require-effective-target arm32 } */
++extern void overflow_handler ();
++
++unsigned long overflow_add (unsigned long x, unsigned long y)
++{
++ unsigned long r;
++
++ int ovr = __builtin_uaddl_overflow (x, y, &r);
++ if (ovr)
++ overflow_handler ();
++
++ return r;
++}
++
++/* { dg-final { scan-assembler "adds" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/builtin_uaddll.c
+@@ -0,0 +1,18 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++/* { dg-require-effective-target arm32 } */
++extern void overflow_handler ();
++
++unsigned long long overflow_add (unsigned long long x, unsigned long long y)
++{
++ unsigned long long r;
++
++ int ovr = __builtin_uaddll_overflow (x, y, &r);
++ if (ovr)
++ overflow_handler ();
++
++ return r;
++}
++
++/* { dg-final { scan-assembler "adds" } } */
++/* { dg-final { scan-assembler "adcs" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/builtin_usubl.c
+@@ -0,0 +1,17 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++/* { dg-require-effective-target arm32 } */
++extern void overflow_handler ();
++
++unsigned long overflow_sub (unsigned long x, unsigned long y)
++{
++ unsigned long r;
++
++ int ovr = __builtin_usubl_overflow (x, y, &r);
++ if (ovr)
++ overflow_handler ();
++
++ return r;
++}
++
++/* { dg-final { scan-assembler "subs" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/builtin_usubll.c
+@@ -0,0 +1,18 @@
++/* { dg-do compile } */
++/* { dg-options "-O2" } */
++/* { dg-require-effective-target arm32 } */
++extern void overflow_handler ();
++
++unsigned long long overflow_sub (unsigned long long x, unsigned long long y)
++{
++ unsigned long long r;
++
++ int ovr = __builtin_usubll_overflow (x, y, &r);
++ if (ovr)
++ overflow_handler ();
++
++ return r;
++}
++
++/* { dg-final { scan-assembler "subs" } } */
++/* { dg-final { scan-assembler "sbcs" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/cbz.c
+@@ -0,0 +1,12 @@
++/* { dg-do compile {target { arm_thumb2 || arm_thumb1_cbz_ok } } } */
++/* { dg-options "-O2" } */
++
++int
++foo (int a, int *b)
++{
++ if (a)
++ *b = 1;
++ return 0;
++}
++
++/* { dg-final { scan-assembler-times "cbz\\tr\\d" 1 } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/data-rel-1.c
+@@ -0,0 +1,12 @@
++/* { dg-options "-fPIC -mno-pic-data-is-text-relative" } */
++/* { dg-final { scan-assembler-not "j-\\(.LPIC" } } */
++/* { dg-final { scan-assembler-not "_GLOBAL_OFFSET_TABLE_-\\(.LPIC" } } */
++/* { dg-final { scan-assembler "j\\(GOT\\)" } } */
++/* { dg-final { scan-assembler "(ldr|mov)\tr\[0-9\]+, \\\[?r9" } } */
++
++static int j;
++
++int *Foo ()
++{
++ return &j;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/data-rel-2.c
+@@ -0,0 +1,11 @@
++/* { dg-options "-fPIC -mno-pic-data-is-text-relative -mno-single-pic-base" } */
++/* { dg-final { scan-assembler-not "j-\\(.LPIC" } } */
++/* { dg-final { scan-assembler "_GLOBAL_OFFSET_TABLE_-\\(.LPIC" } } */
++/* { dg-final { scan-assembler "j\\(GOT\\)" } } */
++
++static int j;
++
++int *Foo ()
++{
++ return &j;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/data-rel-3.c
+@@ -0,0 +1,11 @@
++/* { dg-options "-fPIC -mpic-data-is-text-relative" } */
++/* { dg-final { scan-assembler "j-\\(.LPIC" } } */
++/* { dg-final { scan-assembler-not "_GLOBAL_OFFSET_TABLE_-\\(.LPIC" } } */
++/* { dg-final { scan-assembler-not "j\\(GOT\\)" } } */
++
++static int j;
++
++int *Foo ()
++{
++ return &j;
++}
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c
+@@ -0,0 +1,21 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_hard_vfp_ok } */
++/* { dg-require-effective-target arm_fp16_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_fp16_ieee } */
++
++/* Test __fp16 arguments and return value in registers (hard-float). */
++
++void
++swap (__fp16, __fp16);
++
++__fp16
++F (__fp16 a, __fp16 b, __fp16 c)
++{
++ swap (b, a);
++ return c;
++}
++
++/* { dg-final { scan-assembler {vmov(\.f16)?\tr[0-9]+, s[0-9]+} } } */
++/* { dg-final { scan-assembler {vmov(\.f32)?\ts1, s0} } } */
++/* { dg-final { scan-assembler {vmov(\.f16)?\ts0, r[0-9]+} } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
+@@ -0,0 +1,21 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_ok } */
++/* { dg-options "-mfloat-abi=softfp -O2" } */
++/* { dg-add-options arm_fp16_ieee } */
++/* { dg-skip-if "incompatible float-abi" { arm*-*-* } { "-mfloat-abi=hard" } } */
++
++/* Test __fp16 arguments and return value in registers (softfp). */
++
++void
++swap (__fp16, __fp16);
++
++__fp16
++F (__fp16 a, __fp16 b, __fp16 c)
++{
++ swap (b, a);
++ return c;
++}
++
++/* { dg-final { scan-assembler-times {mov\tr[0-9]+, r[0-2]} 3 } } */
++/* { dg-final { scan-assembler-times {mov\tr1, r0} 1 } } */
++/* { dg-final { scan-assembler-times {mov\tr0, r[0-9]+} 2 } } */
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-1.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-1.c
+@@ -1,4 +1,5 @@
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
- TEST_ALL_EXTRA_CHUNKS(4, 1);
-
-@@ -529,6 +616,10 @@ void exec_vstX_lane (void)
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st4_1, CMT);
- CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st4_1, CMT);
- CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st4_1, CMT);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st4_1, CMT);
-+ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st4_1, CMT);
-+#endif
+ __fp16 xx = 0.0;
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-10.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-10.c
+@@ -1,4 +1,5 @@
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative -pedantic -std=gnu99" } */
- TEST_ALL_EXTRA_CHUNKS(4, 2);
+ #include <math.h>
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-11.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-11.c
+@@ -1,4 +1,5 @@
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative -pedantic -std=gnu99" } */
-@@ -549,6 +640,10 @@ void exec_vstX_lane (void)
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st4_2, CMT);
- CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st4_2, CMT);
- CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st4_2, CMT);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st4_2, CMT);
-+ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st4_2, CMT);
-+#endif
+ #include <math.h>
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-12.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-12.c
+@@ -1,4 +1,5 @@
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
- TEST_ALL_EXTRA_CHUNKS(4, 3);
+ float xx __attribute__((mode(HF))) = 0.0;
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-2.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-2.c
+@@ -1,4 +1,5 @@
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
-@@ -569,6 +664,10 @@ void exec_vstX_lane (void)
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_st4_3, CMT);
- CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected_st4_3, CMT);
- CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_st4_3, CMT);
-+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
-+ CHECK_FP(TEST_MSG, float, 16, 4, PRIx16, expected_st4_3, CMT);
-+ CHECK_FP(TEST_MSG, float, 16, 8, PRIx16, expected_st4_3, CMT);
-+#endif
- }
+ /* Encoding taken from: http://en.wikipedia.org/wiki/Half_precision */
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-3.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-3.c
+@@ -1,4 +1,5 @@
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
- int main (void)
---- a/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
-@@ -32,10 +32,21 @@ VECT_VAR_DECL(expected_unsigned,uint,16,8) [] = { 0x0, 0xffff,
- VECT_VAR_DECL(expected_unsigned,uint,32,4) [] = { 0x0, 0xffffffff,
- 0x0, 0xffffffff };
+ /* Encoding taken from: http://en.wikipedia.org/wiki/Half_precision */
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-4.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-4.c
+@@ -1,4 +1,5 @@
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
--#ifndef INSN_NAME
-+/* Expected results with poly input. */
-+VECT_VAR_DECL(expected_poly,uint,8,8) [] = { 0x0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(expected_poly,uint,8,16) [] = { 0x0, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff,
-+ 0xff, 0xff, 0xff, 0xff };
-+VECT_VAR_DECL(expected_poly,uint,16,4) [] = { 0x0, 0xffff, 0x0, 0xffff };
-+VECT_VAR_DECL(expected_poly,uint,16,8) [] = { 0x0, 0xffff,
-+ 0x0, 0xffff,
-+ 0xffff, 0xffff,
-+ 0xffff, 0xffff };
-+
- #define INSN_NAME vtst
- #define TEST_MSG "VTST/VTSTQ"
--#endif
+ /* Encoding taken from: http://en.wikipedia.org/wiki/Half_precision */
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-5.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-5.c
+@@ -1,4 +1,5 @@
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
- /* We can't use the standard ref_v_binary_op.c template because vtst
- has no 64 bits variant, and outputs are always of uint type. */
-@@ -73,12 +84,16 @@ FNNAME (INSN_NAME)
- VDUP(vector2, , uint, u, 8, 8, 15);
- VDUP(vector2, , uint, u, 16, 4, 5);
- VDUP(vector2, , uint, u, 32, 2, 1);
-+ VDUP(vector2, , poly, p, 8, 8, 15);
-+ VDUP(vector2, , poly, p, 16, 4, 5);
- VDUP(vector2, q, int, s, 8, 16, 15);
- VDUP(vector2, q, int, s, 16, 8, 5);
- VDUP(vector2, q, int, s, 32, 4, 1);
- VDUP(vector2, q, uint, u, 8, 16, 15);
- VDUP(vector2, q, uint, u, 16, 8, 5);
- VDUP(vector2, q, uint, u, 32, 4, 1);
-+ VDUP(vector2, q, poly, p, 8, 16, 15);
-+ VDUP(vector2, q, poly, p, 16, 8, 5);
+ /* Encoding taken from: http://en.wikipedia.org/wiki/Half_precision */
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-6.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-6.c
+@@ -1,4 +1,5 @@
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
- #define TEST_MACRO_NO64BIT_VARIANT_1_5(MACRO, VAR, T1, T2) \
- MACRO(VAR, , T1, T2, 8, 8); \
-@@ -111,6 +126,18 @@ FNNAME (INSN_NAME)
- CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_unsigned, CMT);
- CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_unsigned, CMT);
- CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected_unsigned, CMT);
-+
-+ /* Now, test the variants with poly8 and poly16 as input. */
-+#undef CMT
-+#define CMT " (poly input)"
-+ TEST_BINARY_OP(INSN_NAME, , poly, p, 8, 8);
-+ TEST_BINARY_OP(INSN_NAME, , poly, p, 16, 4);
-+ TEST_BINARY_OP(INSN_NAME, q, poly, p, 8, 16);
-+ TEST_BINARY_OP(INSN_NAME, q, poly, p, 16, 8);
-+ CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_poly, CMT);
-+ CHECK(TEST_MSG, uint, 16, 4, PRIx16, expected_poly, CMT);
-+ CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_poly, CMT);
-+ CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected_poly, CMT);
- }
+ /* This number is the maximum value representable in the alternative
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-7.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-7.c
+@@ -1,4 +1,5 @@
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative -pedantic" } */
- int main (void)
---- a/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-1.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-1.c
+ /* This number overflows the range of the alternative encoding. Since this
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-8.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-8.c
@@ -1,4 +1,5 @@
- /* { dg-error "unknown" "" {target "aarch64*-*-*" } } */
-+/* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
- /* { dg-options "-O2 -mcpu=dummy" } */
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
- void f ()
---- a/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-2.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-2.c
+ /* Encoding taken from: http://en.wikipedia.org/wiki/Half_precision */
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-9.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-alt-9.c
@@ -1,4 +1,5 @@
- /* { dg-error "missing" "" {target "aarch64*-*-*" } } */
-+/* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
- /* { dg-options "-O2 -mcpu=cortex-a53+no" } */
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
- void f ()
---- a/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-3.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-3.c
+ /* Encoding taken from: http://en.wikipedia.org/wiki/Half_precision */
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-none-1.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-none-1.c
@@ -1,4 +1,5 @@
- /* { dg-error "invalid feature" "" {target "aarch64*-*-*" } } */
-+/* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
- /* { dg-options "-O2 -mcpu=cortex-a53+dummy" } */
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_none_ok } */
+ /* { dg-options "-mfp16-format=none" } */
- void f ()
---- a/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-4.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-4.c
+ /* __fp16 type name is not recognized unless you explicitly enable it
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-compile-none-2.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-compile-none-2.c
@@ -1,4 +1,5 @@
- /* { dg-error "missing" "" {target "aarch64*-*-*" } } */
-+/* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
- /* { dg-options "-O2 -mcpu=+dummy" } */
+ /* { dg-do compile } */
++/* { dg-require-effective-target arm_fp16_none_ok } */
+ /* { dg-options "-mfp16-format=none" } */
- void f ()
---- a/src/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
-@@ -110,6 +110,6 @@ main (int argc, char **argv)
- /* vfmaq_lane_f64.
- vfma_laneq_f64.
- vfmaq_laneq_f64. */
--/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d\\\[\[0-9\]+\\\]" 3 } } */
-+/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2?d\\\[\[0-9\]+\\\]" 3 } } */
+ /* mode(HF) attributes are not recognized unless you explicitly enable
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-param-1.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-param-1.c
+@@ -1,10 +1,14 @@
+ /* { dg-do compile } */
+ /* { dg-options "-mfp16-format=ieee" } */
+-/* Functions cannot have parameters of type __fp16. */
+-extern void f (__fp16); /* { dg-error "parameters cannot have __fp16 type" } */
+-extern void (*pf) (__fp16); /* { dg-error "parameters cannot have __fp16 type" } */
++/* Test that the ACLE macro is defined. */
++#if __ARM_FP16_ARGS != 1
++#error Unexpected value for __ARM_FP16_ARGS
++#endif
++
++/* Test that __fp16 is supported as a parameter type. */
++extern void f (__fp16);
++extern void (*pf) (__fp16);
---- a/src/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c
-@@ -111,6 +111,6 @@ main (int argc, char **argv)
- /* vfmsq_lane_f64.
- vfms_laneq_f64.
- vfmsq_laneq_f64. */
--/* { dg-final { scan-assembler-times "fmls\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d\\\[\[0-9\]+\\\]" 3 } } */
-+/* { dg-final { scan-assembler-times "fmls\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2?d\\\[\[0-9\]+\\\]" 3 } } */
+-/* These should be OK. */
+ extern void g (__fp16 *);
+ extern void (*pg) (__fp16 *);
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-return-1.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-return-1.c
+@@ -1,10 +1,9 @@
+ /* { dg-do compile } */
+ /* { dg-options "-mfp16-format=ieee" } */
+-/* Functions cannot return type __fp16. */
+-extern __fp16 f (void); /* { dg-error "cannot return __fp16" } */
+-extern __fp16 (*pf) (void); /* { dg-error "cannot return __fp16" } */
++/* Test that __fp16 is supported as a return type. */
++extern __fp16 f (void);
++extern __fp16 (*pf) (void);
---- a/src/gcc/testsuite/gcc.target/aarch64/fmovd-zero-reg.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/fmovd-zero-reg.c
-@@ -8,4 +8,4 @@ foo (void)
- bar (0.0);
- }
+-/* These should be OK. */
+ extern __fp16 *g (void);
+ extern __fp16 *(*pg) (void);
+--- a/src/gcc/testsuite/gcc.target/arm/fp16-rounding-alt-1.c
++++ b/src/gcc/testsuite/gcc.target/arm/fp16-rounding-alt-1.c
+@@ -3,6 +3,7 @@
+ from double to __fp16. */
--/* { dg-final { scan-assembler "fmov\\td0, xzr" } } */
-+/* { dg-final { scan-assembler "movi\\td0, #0" } } */
---- a/src/gcc/testsuite/gcc.target/aarch64/fmovf-zero-reg.c
-+++ b/src/gcc/testsuite/gcc.target/aarch64/fmovf-zero-reg.c
-@@ -8,4 +8,4 @@ foo (void)
- bar (0.0);
- }
+ /* { dg-do run } */
++/* { dg-require-effective-target arm_fp16_alternative_ok } */
+ /* { dg-options "-mfp16-format=alternative" } */
--/* { dg-final { scan-assembler "fmov\\ts0, wzr" } } */
-+/* { dg-final { scan-assembler "movi\\tv0\.2s, #0" } } */
+ #include <stdlib.h>
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/pr37780_1.c
-@@ -0,0 +1,46 @@
-+/* Test that we can remove the conditional move due to CLZ
-+ and CTZ being defined at zero. */
-+
-+/* { dg-do compile } */
++++ b/src/gcc/testsuite/gcc.target/arm/movdi_movw.c
+@@ -0,0 +1,12 @@
++/* { dg-do compile { target { arm_thumb2_ok || arm_thumb1_movt_ok } } } */
+/* { dg-options "-O2" } */
+
-+int
-+fooctz (int i)
++long long
++movdi (int a)
+{
-+ return (i == 0) ? 32 : __builtin_ctz (i);
++ return 0xF0F0;
+}
+
-+int
-+fooctz2 (int i)
-+{
-+ return (i != 0) ? __builtin_ctz (i) : 32;
-+}
++/* Accept r1 because big endian targets put the low bits in the highest
++ numbered register of a pair. */
++/* { dg-final { scan-assembler-times "movw\tr\[01\], #61680" 1 } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/movhi_movw.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile { target { arm_thumb2_ok || arm_thumb1_movt_ok } } } */
++/* { dg-options "-O2" } */
+
-+unsigned int
-+fooctz3 (unsigned int i)
++short
++movsi (void)
+{
-+ return (i > 0) ? __builtin_ctz (i) : 32;
++ return (short) 0x7070;
+}
+
-+/* { dg-final { scan-assembler-times "rbit\t*" 3 } } */
-+
-+int
-+fooclz (int i)
-+{
-+ return (i == 0) ? 32 : __builtin_clz (i);
-+}
++/* { dg-final { scan-assembler-times "movw\tr0, #28784" 1 } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/movsi_movw.c
+@@ -0,0 +1,10 @@
++/* { dg-do compile { target { arm_thumb2_ok || arm_thumb1_movt_ok } } } */
++/* { dg-options "-O2" } */
+
+int
-+fooclz2 (int i)
-+{
-+ return (i != 0) ? __builtin_clz (i) : 32;
-+}
-+
-+unsigned int
-+fooclz3 (unsigned int i)
++movsi (void)
+{
-+ return (i > 0) ? __builtin_clz (i) : 32;
++ return 0xF0F0;
+}
+
-+/* { dg-final { scan-assembler-times "clz\t" 6 } } */
-+/* { dg-final { scan-assembler-not "cmp\t.*0" } } */
++/* { dg-final { scan-assembler-times "movw\tr0, #61680" 1 } } */
--- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c
-@@ -0,0 +1,541 @@
-+/* Test the vmul_n_f64 AArch64 SIMD intrinsic. */
-+
-+/* { dg-do run } */
-+/* { dg-options "-O2 --save-temps" } */
++++ b/src/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
+@@ -0,0 +1,19 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_neon_ok } */
++/* { dg-options "-O3" } */
++/* { dg-add-options arm_neon } */
+
-+#include "arm_neon.h"
+
-+extern void abort (void);
+
-+#define A (132.4f)
-+#define B (-0.0f)
-+#define C (-34.8f)
-+#define D (289.34f)
-+float32_t expected2_1[2] = {A * A, B * A};
-+float32_t expected2_2[2] = {A * B, B * B};
-+float32_t expected4_1[4] = {A * A, B * A, C * A, D * A};
-+float32_t expected4_2[4] = {A * B, B * B, C * B, D * B};
-+float32_t expected4_3[4] = {A * C, B * C, C * C, D * C};
-+float32_t expected4_4[4] = {A * D, B * D, C * D, D * D};
-+float32_t _elemA = A;
-+float32_t _elemB = B;
-+float32_t _elemC = C;
-+float32_t _elemD = D;
++int
++t6 (int len, void * dummy, short * __restrict x)
++{
++ len = len & ~31;
++ int result = 0;
++ __asm volatile ("");
++ for (int i = 0; i < len; i++)
++ result += x[i];
++ return result;
++}
+
-+#define AD (1234.5)
-+#define BD (-0.0)
-+#define CD (71.3)
-+#define DD (-1024.4)
-+float64_t expectedd2_1[2] = {AD * CD, BD * CD};
-+float64_t expectedd2_2[2] = {AD * DD, BD * DD};
-+float64_t _elemdC = CD;
-+float64_t _elemdD = DD;
++/* { dg-final { scan-assembler "vaddw\.s16" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/neon-vaddws32.c
+@@ -0,0 +1,18 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_neon_ok } */
++/* { dg-options "-O3" } */
++/* { dg-add-options arm_neon } */
+
+
-+#define AS (1024)
-+#define BS (-31)
-+#define CS (0)
-+#define DS (655)
-+int32_t expecteds2_1[2] = {AS * AS, BS * AS};
-+int32_t expecteds2_2[2] = {AS * BS, BS * BS};
-+int32_t expecteds4_1[4] = {AS * AS, BS * AS, CS * AS, DS * AS};
-+int32_t expecteds4_2[4] = {AS * BS, BS * BS, CS * BS, DS * BS};
-+int32_t expecteds4_3[4] = {AS * CS, BS * CS, CS * CS, DS * CS};
-+int32_t expecteds4_4[4] = {AS * DS, BS * DS, CS * DS, DS * DS};
-+int32_t _elemsA = AS;
-+int32_t _elemsB = BS;
-+int32_t _elemsC = CS;
-+int32_t _elemsD = DS;
++int
++t6 (int len, void * dummy, int * __restrict x)
++{
++ len = len & ~31;
++ long long result = 0;
++ __asm volatile ("");
++ for (int i = 0; i < len; i++)
++ result += x[i];
++ return result;
++}
+
-+#define AH ((int16_t) 0)
-+#define BH ((int16_t) -32)
-+#define CH ((int16_t) 102)
-+#define DH ((int16_t) -51)
-+#define EH ((int16_t) 71)
-+#define FH ((int16_t) -91)
-+#define GH ((int16_t) 48)
-+#define HH ((int16_t) 255)
-+int16_t expectedh4_1[4] = {AH * AH, BH * AH, CH * AH, DH * AH};
-+int16_t expectedh4_2[4] = {AH * BH, BH * BH, CH * BH, DH * BH};
-+int16_t expectedh4_3[4] = {AH * CH, BH * CH, CH * CH, DH * CH};
-+int16_t expectedh4_4[4] = {AH * DH, BH * DH, CH * DH, DH * DH};
-+int16_t expectedh8_1[8] = {AH * AH, BH * AH, CH * AH, DH * AH,
-+ EH * AH, FH * AH, GH * AH, HH * AH};
-+int16_t expectedh8_2[8] = {AH * BH, BH * BH, CH * BH, DH * BH,
-+ EH * BH, FH * BH, GH * BH, HH * BH};
-+int16_t expectedh8_3[8] = {AH * CH, BH * CH, CH * CH, DH * CH,
-+ EH * CH, FH * CH, GH * CH, HH * CH};
-+int16_t expectedh8_4[8] = {AH * DH, BH * DH, CH * DH, DH * DH,
-+ EH * DH, FH * DH, GH * DH, HH * DH};
-+int16_t expectedh8_5[8] = {AH * EH, BH * EH, CH * EH, DH * EH,
-+ EH * EH, FH * EH, GH * EH, HH * EH};
-+int16_t expectedh8_6[8] = {AH * FH, BH * FH, CH * FH, DH * FH,
-+ EH * FH, FH * FH, GH * FH, HH * FH};
-+int16_t expectedh8_7[8] = {AH * GH, BH * GH, CH * GH, DH * GH,
-+ EH * GH, FH * GH, GH * GH, HH * GH};
-+int16_t expectedh8_8[8] = {AH * HH, BH * HH, CH * HH, DH * HH,
-+ EH * HH, FH * HH, GH * HH, HH * HH};
-+int16_t _elemhA = AH;
-+int16_t _elemhB = BH;
-+int16_t _elemhC = CH;
-+int16_t _elemhD = DH;
-+int16_t _elemhE = EH;
-+int16_t _elemhF = FH;
-+int16_t _elemhG = GH;
-+int16_t _elemhH = HH;
++/* { dg-final { scan-assembler "vaddw\.s32" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c
+@@ -0,0 +1,18 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_neon_ok } */
++/* { dg-options "-O3" } */
++/* { dg-add-options arm_neon } */
+
-+#define AUS (1024)
-+#define BUS (31)
-+#define CUS (0)
-+#define DUS (655)
-+uint32_t expectedus2_1[2] = {AUS * AUS, BUS * AUS};
-+uint32_t expectedus2_2[2] = {AUS * BUS, BUS * BUS};
-+uint32_t expectedus4_1[4] = {AUS * AUS, BUS * AUS, CUS * AUS, DUS * AUS};
-+uint32_t expectedus4_2[4] = {AUS * BUS, BUS * BUS, CUS * BUS, DUS * BUS};
-+uint32_t expectedus4_3[4] = {AUS * CUS, BUS * CUS, CUS * CUS, DUS * CUS};
-+uint32_t expectedus4_4[4] = {AUS * DUS, BUS * DUS, CUS * DUS, DUS * DUS};
-+uint32_t _elemusA = AUS;
-+uint32_t _elemusB = BUS;
-+uint32_t _elemusC = CUS;
-+uint32_t _elemusD = DUS;
+
-+#define AUH ((uint16_t) 0)
-+#define BUH ((uint16_t) 32)
-+#define CUH ((uint16_t) 102)
-+#define DUH ((uint16_t) 51)
-+#define EUH ((uint16_t) 71)
-+#define FUH ((uint16_t) 91)
-+#define GUH ((uint16_t) 48)
-+#define HUH ((uint16_t) 255)
-+uint16_t expecteduh4_1[4] = {AUH * AUH, BUH * AUH, CUH * AUH, DUH * AUH};
-+uint16_t expecteduh4_2[4] = {AUH * BUH, BUH * BUH, CUH * BUH, DUH * BUH};
-+uint16_t expecteduh4_3[4] = {AUH * CUH, BUH * CUH, CUH * CUH, DUH * CUH};
-+uint16_t expecteduh4_4[4] = {AUH * DUH, BUH * DUH, CUH * DUH, DUH * DUH};
-+uint16_t expecteduh8_1[8] = {AUH * AUH, BUH * AUH, CUH * AUH, DUH * AUH,
-+ EUH * AUH, FUH * AUH, GUH * AUH, HUH * AUH};
-+uint16_t expecteduh8_2[8] = {AUH * BUH, BUH * BUH, CUH * BUH, DUH * BUH,
-+ EUH * BUH, FUH * BUH, GUH * BUH, HUH * BUH};
-+uint16_t expecteduh8_3[8] = {AUH * CUH, BUH * CUH, CUH * CUH, DUH * CUH,
-+ EUH * CUH, FUH * CUH, GUH * CUH, HUH * CUH};
-+uint16_t expecteduh8_4[8] = {AUH * DUH, BUH * DUH, CUH * DUH, DUH * DUH,
-+ EUH * DUH, FUH * DUH, GUH * DUH, HUH * DUH};
-+uint16_t expecteduh8_5[8] = {AUH * EUH, BUH * EUH, CUH * EUH, DUH * EUH,
-+ EUH * EUH, FUH * EUH, GUH * EUH, HUH * EUH};
-+uint16_t expecteduh8_6[8] = {AUH * FUH, BUH * FUH, CUH * FUH, DUH * FUH,
-+ EUH * FUH, FUH * FUH, GUH * FUH, HUH * FUH};
-+uint16_t expecteduh8_7[8] = {AUH * GUH, BUH * GUH, CUH * GUH, DUH * GUH,
-+ EUH * GUH, FUH * GUH, GUH * GUH, HUH * GUH};
-+uint16_t expecteduh8_8[8] = {AUH * HUH, BUH * HUH, CUH * HUH, DUH * HUH,
-+ EUH * HUH, FUH * HUH, GUH * HUH, HUH * HUH};
-+uint16_t _elemuhA = AUH;
-+uint16_t _elemuhB = BUH;
-+uint16_t _elemuhC = CUH;
-+uint16_t _elemuhD = DUH;
-+uint16_t _elemuhE = EUH;
-+uint16_t _elemuhF = FUH;
-+uint16_t _elemuhG = GUH;
-+uint16_t _elemuhH = HUH;
++int
++t6 (int len, void * dummy, unsigned short * __restrict x)
++{
++ len = len & ~31;
++ unsigned int result = 0;
++ __asm volatile ("");
++ for (int i = 0; i < len; i++)
++ result += x[i];
++ return result;
++}
+
-+void
-+check_v2sf (float32_t elemA, float32_t elemB)
++/* { dg-final { scan-assembler "vaddw.u16" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c
+@@ -0,0 +1,18 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_neon_ok } */
++/* { dg-options "-O3" } */
++/* { dg-add-options arm_neon } */
++
++
++int
++t6 (int len, void * dummy, unsigned int * __restrict x)
+{
-+ int32_t indx;
-+ const float32_t vec32x2_buf[2] = {A, B};
-+ float32x2_t vec32x2_src = vld1_f32 (vec32x2_buf);
-+ float32_t vec32x2_res[2];
++ len = len & ~31;
++ unsigned long long result = 0;
++ __asm volatile ("");
++ for (int i = 0; i < len; i++)
++ result += x[i];
++ return result;
++}
+
-+ vst1_f32 (vec32x2_res, vmul_n_f32 (vec32x2_src, elemA));
++/* { dg-final { scan-assembler "vaddw\.u32" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c
+@@ -0,0 +1,19 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_neon_ok } */
++/* { dg-options "-O3" } */
++/* { dg-add-options arm_neon } */
+
-+ for (indx = 0; indx < 2; indx++)
-+ if (* (uint32_t *) &vec32x2_res[indx] != * (uint32_t *) &expected2_1[indx])
-+ abort ();
+
-+ vst1_f32 (vec32x2_res, vmul_n_f32 (vec32x2_src, elemB));
+
-+ for (indx = 0; indx < 2; indx++)
-+ if (* (uint32_t *) &vec32x2_res[indx] != * (uint32_t *) &expected2_2[indx])
-+ abort ();
++int
++t6 (int len, void * dummy, char * __restrict x)
++{
++ len = len & ~31;
++ unsigned short result = 0;
++ __asm volatile ("");
++ for (int i = 0; i < len; i++)
++ result += x[i];
++ return result;
++}
++
++/* { dg-final { scan-assembler "vaddw\.u8" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/neon.exp
++++ b/src//dev/null
+@@ -1,35 +0,0 @@
+-# Copyright (C) 1997-2016 Free Software Foundation, Inc.
+-
+-# This program is free software; you can redistribute it and/or modify
+-# it under the terms of the GNU General Public License as published by
+-# the Free Software Foundation; either version 3 of the License, or
+-# (at your option) any later version.
+-#
+-# This program is distributed in the hope that it will be useful,
+-# but WITHOUT ANY WARRANTY; without even the implied warranty of
+-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+-# GNU General Public License for more details.
+-#
+-# You should have received a copy of the GNU General Public License
+-# along with GCC; see the file COPYING3. If not see
+-# <http://www.gnu.org/licenses/>.
+-
+-# GCC testsuite that uses the `dg.exp' driver.
+-
+-# Exit immediately if this isn't an ARM target.
+-if ![istarget arm*-*-*] then {
+- return
+-}
+-
+-# Load support procs.
+-load_lib gcc-dg.exp
+-
+-# Initialize `dg'.
+-dg-init
+-
+-# Main loop.
+-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+- "" ""
+-
+-# All done.
+-dg-finish
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRaddhns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRaddhns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRaddhns16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int8x8_t = vraddhn_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vraddhn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRaddhns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRaddhns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRaddhns32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int16x4_t = vraddhn_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vraddhn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRaddhns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRaddhns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRaddhns64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int32x2_t = vraddhn_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vraddhn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRaddhnu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRaddhnu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRaddhnu16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint8x8_t = vraddhn_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vraddhn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRaddhnu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRaddhnu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRaddhnu32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint16x4_t = vraddhn_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vraddhn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRaddhnu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRaddhnu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRaddhnu64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint32x2_t = vraddhn_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vraddhn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhaddQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhaddQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhaddQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vrhaddq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhaddQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhaddQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhaddQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vrhaddq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhaddQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhaddQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhaddQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vrhaddq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhaddQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhaddQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhaddQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vrhaddq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhaddQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhaddQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhaddQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vrhaddq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhaddQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhaddQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhaddQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vrhaddq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhadds16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhadds16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhadds16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vrhadd_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhadds32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhadds32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhadds32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vrhadd_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhadds8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhadds8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhadds8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vrhadd_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhaddu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhaddu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhaddu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vrhadd_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhaddu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhaddu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhaddu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vrhadd_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRhaddu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRhaddu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRhaddu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vrhadd_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrhadd\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vrshlq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vrshlq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vrshlq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vrshlq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_uint16x8_t = vrshlq_u16 (arg0_uint16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_uint32x4_t = vrshlq_u32 (arg0_uint32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_uint64x2_t = vrshlq_u64 (arg0_uint64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_uint8x16_t = vrshlq_u8 (arg0_uint8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshls16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshls16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vrshl_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshls32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshls32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vrshl_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshls64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshls64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshls64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vrshl_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshls8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshls8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vrshl_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_uint16x4_t = vrshl_u16 (arg0_uint16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_uint32x2_t = vrshl_u32 (arg0_uint32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_uint64x1_t = vrshl_u64 (arg0_uint64x1_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshlu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRshlu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshlu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_uint8x8_t = vrshl_u8 (arg0_uint8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrshl\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrQ_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vrshrq_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrQ_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vrshrq_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrQ_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrQ_ns64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int64x2_t = vrshrq_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrQ_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrQ_ns8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vrshrq_n_s8 (arg0_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrQ_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16x8_t = vrshrq_n_u16 (arg0_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrQ_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x4_t = vrshrq_n_u32 (arg0_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrQ_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrQ_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrQ_nu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint64x2_t = vrshrq_n_u64 (arg0_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrQ_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrQ_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrQ_nu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x16_t = vrshrq_n_u8 (arg0_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshr_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshr_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshr_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vrshr_n_s16 (arg0_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshr_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshr_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshr_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vrshr_n_s32 (arg0_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshr_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshr_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshr_ns64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_int64x1_t = vrshr_n_s64 (arg0_int64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshr_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshr_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshr_ns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vrshr_n_s8 (arg0_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshr_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshr_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshr_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16x4_t = vrshr_n_u16 (arg0_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshr_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshr_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshr_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x2_t = vrshr_n_u32 (arg0_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshr_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshr_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshr_nu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_uint64x1_t = vrshr_n_u64 (arg0_uint64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshr_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshr_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshr_nu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vrshr_n_u8 (arg0_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshr\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrn_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrn_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrn_ns16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int8x8_t = vrshrn_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshrn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrn_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrn_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrn_ns32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int16x4_t = vrshrn_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshrn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrn_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrn_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrn_ns64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int32x2_t = vrshrn_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshrn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrn_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrn_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrn_nu16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint8x8_t = vrshrn_n_u16 (arg0_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshrn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrn_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrn_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrn_nu32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint16x4_t = vrshrn_n_u32 (arg0_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshrn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRshrn_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vRshrn_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRshrn_nu64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint32x2_t = vrshrn_n_u64 (arg0_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrshrn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsraQ_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsraQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsraQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vrsraq_n_s16 (arg0_int16x8_t, arg1_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsraQ_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsraQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsraQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vrsraq_n_s32 (arg0_int32x4_t, arg1_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsraQ_ns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsraQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsraQ_ns64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vrsraq_n_s64 (arg0_int64x2_t, arg1_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsraQ_ns8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsraQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsraQ_ns8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vrsraq_n_s8 (arg0_int8x16_t, arg1_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsraQ_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsraQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsraQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vrsraq_n_u16 (arg0_uint16x8_t, arg1_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsraQ_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsraQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsraQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vrsraq_n_u32 (arg0_uint32x4_t, arg1_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsraQ_nu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsraQ_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsraQ_nu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vrsraq_n_u64 (arg0_uint64x2_t, arg1_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsraQ_nu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsraQ_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsraQ_nu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vrsraq_n_u8 (arg0_uint8x16_t, arg1_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsra_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsra_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsra_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vrsra_n_s16 (arg0_int16x4_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsra_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsra_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsra_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vrsra_n_s32 (arg0_int32x2_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsra_ns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsra_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsra_ns64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vrsra_n_s64 (arg0_int64x1_t, arg1_int64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsra_ns8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsra_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsra_ns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vrsra_n_s8 (arg0_int8x8_t, arg1_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsra_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsra_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsra_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vrsra_n_u16 (arg0_uint16x4_t, arg1_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsra_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsra_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsra_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vrsra_n_u32 (arg0_uint32x2_t, arg1_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsra_nu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsra_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsra_nu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vrsra_n_u64 (arg0_uint64x1_t, arg1_uint64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsra_nu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsra_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsra_nu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vrsra_n_u8 (arg0_uint8x8_t, arg1_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vrsra\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsubhns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsubhns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsubhns16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int8x8_t = vrsubhn_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsubhn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsubhns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsubhns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsubhns32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int16x4_t = vrsubhn_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsubhn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsubhns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsubhns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsubhns64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int32x2_t = vrsubhn_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsubhn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsubhnu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsubhnu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsubhnu16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint8x8_t = vrsubhn_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsubhn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsubhnu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsubhnu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsubhnu32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint16x4_t = vrsubhn_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsubhn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vRsubhnu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vRsubhnu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vRsubhnu64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint32x2_t = vrsubhn_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsubhn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabaQs16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabaQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabaQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+- int16x8_t arg2_int16x8_t;
+-
+- out_int16x8_t = vabaq_s16 (arg0_int16x8_t, arg1_int16x8_t, arg2_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabaQs32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabaQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabaQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+- int32x4_t arg2_int32x4_t;
+-
+- out_int32x4_t = vabaq_s32 (arg0_int32x4_t, arg1_int32x4_t, arg2_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabaQs8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabaQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabaQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+- int8x16_t arg2_int8x16_t;
+-
+- out_int8x16_t = vabaq_s8 (arg0_int8x16_t, arg1_int8x16_t, arg2_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabaQu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabaQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabaQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+- uint16x8_t arg2_uint16x8_t;
+-
+- out_uint16x8_t = vabaq_u16 (arg0_uint16x8_t, arg1_uint16x8_t, arg2_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabaQu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabaQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabaQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+- uint32x4_t arg2_uint32x4_t;
+-
+- out_uint32x4_t = vabaq_u32 (arg0_uint32x4_t, arg1_uint32x4_t, arg2_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabaQu8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabaQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabaQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+- uint8x16_t arg2_uint8x16_t;
+-
+- out_uint8x16_t = vabaq_u8 (arg0_uint8x16_t, arg1_uint8x16_t, arg2_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabals16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabals16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabals16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int32x4_t = vabal_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabal\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabals32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabals32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabals32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int64x2_t = vabal_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabal\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabals8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabals8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabals8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int8x8_t arg1_int8x8_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int16x8_t = vabal_s8 (arg0_int16x8_t, arg1_int8x8_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabal\.s8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabalu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabalu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabalu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint32x4_t = vabal_u16 (arg0_uint32x4_t, arg1_uint16x4_t, arg2_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabal\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabalu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabalu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabalu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint64x2_t = vabal_u32 (arg0_uint64x2_t, arg1_uint32x2_t, arg2_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabal\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabalu8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabalu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabalu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint8x8_t arg1_uint8x8_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint16x8_t = vabal_u8 (arg0_uint16x8_t, arg1_uint8x8_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabal\.u8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabas16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabas16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabas16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int16x4_t = vaba_s16 (arg0_int16x4_t, arg1_int16x4_t, arg2_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabas32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabas32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabas32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int32x2_t = vaba_s32 (arg0_int32x2_t, arg1_int32x2_t, arg2_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabas8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabas8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabas8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int8x8_t = vaba_s8 (arg0_int8x8_t, arg1_int8x8_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabau16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabau16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabau16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint16x4_t = vaba_u16 (arg0_uint16x4_t, arg1_uint16x4_t, arg2_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabau32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabau32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabau32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint32x2_t = vaba_u32 (arg0_uint32x2_t, arg1_uint32x2_t, arg2_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabau8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vabau8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabau8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint8x8_t = vaba_u8 (arg0_uint8x8_t, arg1_uint8x8_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaba\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vabdq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vabdq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vabdq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vabdq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vabdq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vabdq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vabdq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vabd_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdls16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdls16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int32x4_t = vabdl_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabdl\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdls32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdls32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int64x2_t = vabdl_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabdl\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdls8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdls8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int16x8_t = vabdl_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabdl\.s8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdlu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdlu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdlu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint32x4_t = vabdl_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabdl\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdlu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdlu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdlu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint64x2_t = vabdl_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabdl\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdlu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdlu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdlu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint16x8_t = vabdl_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabdl\.u8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabds16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabds16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabds16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vabd_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabds32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabds32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabds32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vabd_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabds8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabds8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabds8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vabd_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vabd_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vabd_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabdu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vabdu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabdu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vabd_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabd\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabsQf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vabsQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabsQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x4_t = vabsq_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabs\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabsQs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vabsQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabsQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vabsq_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabs\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabsQs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vabsQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabsQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vabsq_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabs\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabsQs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vabsQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabsQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vabsq_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabs\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabsf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vabsf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabsf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vabs_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabs\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabss16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vabss16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabss16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vabs_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabs\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabss32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vabss32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabss32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vabs_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabs\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vabss8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vabss8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vabss8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vabs_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vabs\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vaddq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vaddq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vaddq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vaddq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vaddq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vaddq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vaddq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vaddq_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vaddq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vadd_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddhns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddhns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddhns16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int8x8_t = vaddhn_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddhn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddhns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddhns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddhns32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int16x4_t = vaddhn_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddhn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddhns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddhns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddhns64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int32x2_t = vaddhn_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddhn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddhnu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddhnu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddhnu16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint8x8_t = vaddhn_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddhn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddhnu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddhnu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddhnu32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint16x4_t = vaddhn_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddhn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddhnu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddhnu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddhnu64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint32x2_t = vaddhn_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddhn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddls16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddls16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int32x4_t = vaddl_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddl\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddls32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddls32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int64x2_t = vaddl_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddl\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddls8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddls8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int16x8_t = vaddl_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddl\.s8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddlu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddlu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddlu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint32x4_t = vaddl_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddl\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddlu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddlu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddlu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint64x2_t = vaddl_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddl\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddlu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddlu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddlu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint16x8_t = vaddl_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddl\.u8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vadds16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vadds16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vadds16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vadd_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vadds32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vadds32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vadds32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vadd_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vadds64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vadds64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vadds64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vadd_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vadds8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vadds8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vadds8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vadd_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vadd_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vadd_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vaddu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vadd_u64 (arg0_uint64x1_t, arg1_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vadd_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vadd\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddws16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddws16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddws16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int32x4_t = vaddw_s16 (arg0_int32x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddw\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddws32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddws32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddws32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int64x2_t = vaddw_s32 (arg0_int64x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddw\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddws8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddws8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddws8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int16x8_t = vaddw_s8 (arg0_int16x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddw\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddwu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddwu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddwu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint32x4_t = vaddw_u16 (arg0_uint32x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddw\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddwu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddwu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddwu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint64x2_t = vaddw_u32 (arg0_uint64x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddw\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vaddwu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vaddwu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vaddwu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint16x8_t = vaddw_u8 (arg0_uint16x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vaddw\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vandq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vandq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vandq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vandq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vandq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vandq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vandq_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vandq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vands16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vands16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vands16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vand_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vands32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vands32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vands32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vand_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vands64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vands64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vands64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vand_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vands8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vands8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vands8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vand_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vand_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vand_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vandu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vand_u64 (arg0_uint64x1_t, arg1_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vandu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vandu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vandu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vand_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vand\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int16x8_t out_int16x8_t;
+-int16x8_t arg0_int16x8_t;
+-int16x8_t arg1_int16x8_t;
+-void test_vbicQs16 (void)
+-{
+-
+- out_int16x8_t = vbicq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int32x4_t out_int32x4_t;
+-int32x4_t arg0_int32x4_t;
+-int32x4_t arg1_int32x4_t;
+-void test_vbicQs32 (void)
+-{
+-
+- out_int32x4_t = vbicq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int64x2_t out_int64x2_t;
+-int64x2_t arg0_int64x2_t;
+-int64x2_t arg1_int64x2_t;
+-void test_vbicQs64 (void)
+-{
+-
+- out_int64x2_t = vbicq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int8x16_t out_int8x16_t;
+-int8x16_t arg0_int8x16_t;
+-int8x16_t arg1_int8x16_t;
+-void test_vbicQs8 (void)
+-{
+-
+- out_int8x16_t = vbicq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint16x8_t out_uint16x8_t;
+-uint16x8_t arg0_uint16x8_t;
+-uint16x8_t arg1_uint16x8_t;
+-void test_vbicQu16 (void)
+-{
+-
+- out_uint16x8_t = vbicq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint32x4_t out_uint32x4_t;
+-uint32x4_t arg0_uint32x4_t;
+-uint32x4_t arg1_uint32x4_t;
+-void test_vbicQu32 (void)
+-{
+-
+- out_uint32x4_t = vbicq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint64x2_t out_uint64x2_t;
+-uint64x2_t arg0_uint64x2_t;
+-uint64x2_t arg1_uint64x2_t;
+-void test_vbicQu64 (void)
+-{
+-
+- out_uint64x2_t = vbicq_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint8x16_t out_uint8x16_t;
+-uint8x16_t arg0_uint8x16_t;
+-uint8x16_t arg1_uint8x16_t;
+-void test_vbicQu8 (void)
+-{
+-
+- out_uint8x16_t = vbicq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbics16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbics16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int16x4_t out_int16x4_t;
+-int16x4_t arg0_int16x4_t;
+-int16x4_t arg1_int16x4_t;
+-void test_vbics16 (void)
+-{
+-
+- out_int16x4_t = vbic_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbics32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbics32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int32x2_t out_int32x2_t;
+-int32x2_t arg0_int32x2_t;
+-int32x2_t arg1_int32x2_t;
+-void test_vbics32 (void)
+-{
+-
+- out_int32x2_t = vbic_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbics64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vbics64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int64x1_t out_int64x1_t;
+-int64x1_t arg0_int64x1_t;
+-int64x1_t arg1_int64x1_t;
+-void test_vbics64 (void)
+-{
+-
+- out_int64x1_t = vbic_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbics8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbics8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int8x8_t out_int8x8_t;
+-int8x8_t arg0_int8x8_t;
+-int8x8_t arg1_int8x8_t;
+-void test_vbics8 (void)
+-{
+-
+- out_int8x8_t = vbic_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint16x4_t out_uint16x4_t;
+-uint16x4_t arg0_uint16x4_t;
+-uint16x4_t arg1_uint16x4_t;
+-void test_vbicu16 (void)
+-{
+-
+- out_uint16x4_t = vbic_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint32x2_t out_uint32x2_t;
+-uint32x2_t arg0_uint32x2_t;
+-uint32x2_t arg1_uint32x2_t;
+-void test_vbicu32 (void)
+-{
+-
+- out_uint32x2_t = vbic_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vbicu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint64x1_t out_uint64x1_t;
+-uint64x1_t arg0_uint64x1_t;
+-uint64x1_t arg1_uint64x1_t;
+-void test_vbicu64 (void)
+-{
+-
+- out_uint64x1_t = vbic_u64 (arg0_uint64x1_t, arg1_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbicu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vbicu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint8x8_t out_uint8x8_t;
+-uint8x8_t arg0_uint8x8_t;
+-uint8x8_t arg1_uint8x8_t;
+-void test_vbicu8 (void)
+-{
+-
+- out_uint8x8_t = vbic_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vbic\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- float32x4_t arg1_float32x4_t;
+- float32x4_t arg2_float32x4_t;
+-
+- out_float32x4_t = vbslq_f32 (arg0_uint32x4_t, arg1_float32x4_t, arg2_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQp16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQp16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- poly16x8_t arg1_poly16x8_t;
+- poly16x8_t arg2_poly16x8_t;
+-
+- out_poly16x8_t = vbslq_p16 (arg0_uint16x8_t, arg1_poly16x8_t, arg2_poly16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQp64.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQp64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- poly64x2_t arg1_poly64x2_t;
+- poly64x2_t arg2_poly64x2_t;
+-
+- out_poly64x2_t = vbslq_p64 (arg0_uint64x2_t, arg1_poly64x2_t, arg2_poly64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQp8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQp8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+- poly8x16_t arg2_poly8x16_t;
+-
+- out_poly8x16_t = vbslq_p8 (arg0_uint8x16_t, arg1_poly8x16_t, arg2_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQs16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- int16x8_t arg1_int16x8_t;
+- int16x8_t arg2_int16x8_t;
+-
+- out_int16x8_t = vbslq_s16 (arg0_uint16x8_t, arg1_int16x8_t, arg2_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQs32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- int32x4_t arg1_int32x4_t;
+- int32x4_t arg2_int32x4_t;
+-
+- out_int32x4_t = vbslq_s32 (arg0_uint32x4_t, arg1_int32x4_t, arg2_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQs64.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- int64x2_t arg1_int64x2_t;
+- int64x2_t arg2_int64x2_t;
+-
+- out_int64x2_t = vbslq_s64 (arg0_uint64x2_t, arg1_int64x2_t, arg2_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQs8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- int8x16_t arg1_int8x16_t;
+- int8x16_t arg2_int8x16_t;
+-
+- out_int8x16_t = vbslq_s8 (arg0_uint8x16_t, arg1_int8x16_t, arg2_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+- uint16x8_t arg2_uint16x8_t;
+-
+- out_uint16x8_t = vbslq_u16 (arg0_uint16x8_t, arg1_uint16x8_t, arg2_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+- uint32x4_t arg2_uint32x4_t;
+-
+- out_uint32x4_t = vbslq_u32 (arg0_uint32x4_t, arg1_uint32x4_t, arg2_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQu64.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+- uint64x2_t arg2_uint64x2_t;
+-
+- out_uint64x2_t = vbslq_u64 (arg0_uint64x2_t, arg1_uint64x2_t, arg2_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslQu8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+- uint8x16_t arg2_uint8x16_t;
+-
+- out_uint8x16_t = vbslq_u8 (arg0_uint8x16_t, arg1_uint8x16_t, arg2_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- float32x2_t arg1_float32x2_t;
+- float32x2_t arg2_float32x2_t;
+-
+- out_float32x2_t = vbsl_f32 (arg0_uint32x2_t, arg1_float32x2_t, arg2_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslp16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslp16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- poly16x4_t arg1_poly16x4_t;
+- poly16x4_t arg2_poly16x4_t;
+-
+- out_poly16x4_t = vbsl_p16 (arg0_uint16x4_t, arg1_poly16x4_t, arg2_poly16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslp64.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslp64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- poly64x1_t arg1_poly64x1_t;
+- poly64x1_t arg2_poly64x1_t;
+-
+- out_poly64x1_t = vbsl_p64 (arg0_uint64x1_t, arg1_poly64x1_t, arg2_poly64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslp8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslp8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+- poly8x8_t arg2_poly8x8_t;
+-
+- out_poly8x8_t = vbsl_p8 (arg0_uint8x8_t, arg1_poly8x8_t, arg2_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbsls16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbsls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbsls16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int16x4_t = vbsl_s16 (arg0_uint16x4_t, arg1_int16x4_t, arg2_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbsls32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbsls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbsls32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int32x2_t = vbsl_s32 (arg0_uint32x2_t, arg1_int32x2_t, arg2_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbsls64.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbsls64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbsls64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- int64x1_t arg1_int64x1_t;
+- int64x1_t arg2_int64x1_t;
+-
+- out_int64x1_t = vbsl_s64 (arg0_uint64x1_t, arg1_int64x1_t, arg2_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbsls8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbsls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbsls8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- int8x8_t arg1_int8x8_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int8x8_t = vbsl_s8 (arg0_uint8x8_t, arg1_int8x8_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint16x4_t = vbsl_u16 (arg0_uint16x4_t, arg1_uint16x4_t, arg2_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint32x2_t = vbsl_u32 (arg0_uint32x2_t, arg1_uint32x2_t, arg2_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslu64.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+- uint64x1_t arg2_uint64x1_t;
+-
+- out_uint64x1_t = vbsl_u64 (arg0_uint64x1_t, arg1_uint64x1_t, arg2_uint64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vbslu8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vbslu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vbslu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint8x8_t = vbsl_u8 (arg0_uint8x8_t, arg1_uint8x8_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "((vbsl)|(vbit)|(vbif))\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcageQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcageQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcageQf32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_uint32x4_t = vcageq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vacge\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcagef32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcagef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcagef32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_uint32x2_t = vcage_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vacge\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcagtQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcagtQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcagtQf32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_uint32x4_t = vcagtq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vacgt\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcagtf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcagtf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcagtf32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_uint32x2_t = vcagt_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vacgt\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcaleQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcaleQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcaleQf32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_uint32x4_t = vcaleq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vacge\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcalef32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcalef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcalef32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_uint32x2_t = vcale_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vacge\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcaltQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcaltQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcaltQf32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_uint32x4_t = vcaltq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vacgt\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcaltf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcaltf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcaltf32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_uint32x2_t = vcalt_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vacgt\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqQf32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_uint32x4_t = vceqq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqQp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqQp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqQp8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_uint8x16_t = vceqq_p8 (arg0_poly8x16_t, arg1_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqQs16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_uint16x8_t = vceqq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqQs32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_uint32x4_t = vceqq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqQs8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_uint8x16_t = vceqq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vceqq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vceqq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vceqq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqf32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_uint32x2_t = vceq_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqp8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_uint8x8_t = vceq_p8 (arg0_poly8x8_t, arg1_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqs16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_uint16x4_t = vceq_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqs32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_uint32x2_t = vceq_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vceqs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vceqs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vceqs8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_uint8x8_t = vceq_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcequ16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcequ16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcequ16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vceq_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcequ32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcequ32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcequ32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vceq_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcequ8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcequ8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcequ8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vceq_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vceq\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgeQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgeQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgeQf32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_uint32x4_t = vcgeq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgeQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgeQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgeQs16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_uint16x8_t = vcgeq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgeQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgeQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgeQs32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_uint32x4_t = vcgeq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgeQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgeQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgeQs8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_uint8x16_t = vcgeq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgeQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgeQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgeQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vcgeq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgeQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgeQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgeQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vcgeq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgeQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgeQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgeQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vcgeq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgef32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgef32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_uint32x2_t = vcge_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcges16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcges16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcges16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_uint16x4_t = vcge_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcges32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcges32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcges32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_uint32x2_t = vcge_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcges8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcges8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcges8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_uint8x8_t = vcge_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgeu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgeu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgeu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vcge_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgeu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgeu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgeu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vcge_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgeu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgeu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgeu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vcge_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtQf32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_uint32x4_t = vcgtq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtQs16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_uint16x8_t = vcgtq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtQs32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_uint32x4_t = vcgtq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtQs8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_uint8x16_t = vcgtq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vcgtq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vcgtq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vcgtq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtf32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_uint32x2_t = vcgt_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgts16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgts16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgts16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_uint16x4_t = vcgt_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgts32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgts32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgts32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_uint32x2_t = vcgt_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgts8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgts8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgts8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_uint8x8_t = vcgt_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vcgt_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vcgt_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcgtu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcgtu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcgtu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vcgt_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcleQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcleQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcleQf32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_uint32x4_t = vcleq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcleQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcleQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcleQs16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_uint16x8_t = vcleq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcleQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcleQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcleQs32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_uint32x4_t = vcleq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcleQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcleQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcleQs8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_uint8x16_t = vcleq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcleQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcleQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcleQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vcleq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcleQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcleQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcleQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vcleq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcleQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcleQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcleQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vcleq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclef32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vclef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclef32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_uint32x2_t = vcle_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcles16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcles16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcles16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_uint16x4_t = vcle_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcles32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcles32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcles32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_uint32x2_t = vcle_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcles8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcles8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcles8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_uint8x8_t = vcle_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcleu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcleu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcleu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vcle_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcleu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcleu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcleu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vcle_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcleu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcleu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcleu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vcle_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcge\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclsQs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclsQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclsQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vclsq_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcls\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclsQs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclsQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclsQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vclsq_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcls\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclsQs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclsQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclsQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vclsq_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcls\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclss16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclss16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclss16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vcls_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcls\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclss32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclss32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclss32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vcls_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcls\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclss8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclss8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclss8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vcls_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcls\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltQf32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_uint32x4_t = vcltq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltQs16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_uint16x8_t = vcltq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltQs32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_uint32x4_t = vcltq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltQs8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_uint8x16_t = vcltq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vcltq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vcltq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vcltq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltf32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_uint32x2_t = vclt_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclts16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vclts16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclts16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_uint16x4_t = vclt_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclts32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vclts32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclts32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_uint32x2_t = vclt_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclts8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vclts8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclts8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_uint8x8_t = vclt_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vclt_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vclt_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcltu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vcltu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcltu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vclt_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcgt\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzQs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vclzq_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzQs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vclzq_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzQs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vclzq_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzQu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16x8_t = vclzq_u16 (arg0_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzQu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x4_t = vclzq_u32 (arg0_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzQu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x16_t = vclzq_u8 (arg0_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vclz_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vclz_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzs8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vclz_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16x4_t = vclz_u16 (arg0_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x2_t = vclz_u32 (arg0_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vclzu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vclzu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vclzu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vclz_u8 (arg0_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vclz\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcntQp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcntQp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcntQp8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly8x16_t = vcntq_p8 (arg0_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcnt\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcntQs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcntQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcntQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vcntq_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcnt\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcntQu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcntQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcntQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x16_t = vcntq_u8 (arg0_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcnt\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcntp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcntp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcntp8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_poly8x8_t = vcnt_p8 (arg0_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcnt\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcnts8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcnts8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcnts8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vcnt_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcnt\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcntu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcntu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcntu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vcnt_u8 (arg0_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcnt\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombinef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombinef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombinef32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x4_t = vcombine_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombinep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombinep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombinep16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16x4_t arg0_poly16x4_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- out_poly16x8_t = vcombine_p16 (arg0_poly16x4_t, arg1_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombinep64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombinep64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombinep64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- poly64x1_t arg0_poly64x1_t;
+- poly64x1_t arg1_poly64x1_t;
+-
+- out_poly64x2_t = vcombine_p64 (arg0_poly64x1_t, arg1_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombinep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombinep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombinep8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly8x16_t = vcombine_p8 (arg0_poly8x8_t, arg1_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombines16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombines16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombines16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x8_t = vcombine_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombines32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombines32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombines32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x4_t = vcombine_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombines64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombines64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombines64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x2_t = vcombine_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombines8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombines8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombines8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x16_t = vcombine_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombineu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombineu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombineu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x8_t = vcombine_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombineu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombineu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombineu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x4_t = vcombine_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombineu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombineu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombineu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x2_t = vcombine_u64 (arg0_uint64x1_t, arg1_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcombineu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcombineu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcombineu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x16_t = vcombine_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreatef32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreatef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreatef32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- uint64_t arg0_uint64_t;
+-
+- out_float32x2_t = vcreate_f32 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreatep16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreatep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreatep16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- uint64_t arg0_uint64_t;
+-
+- out_poly16x4_t = vcreate_p16 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreatep64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreatep64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreatep64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- uint64_t arg0_uint64_t;
+-
+- out_poly64x1_t = vcreate_p64 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreatep8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreatep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreatep8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- uint64_t arg0_uint64_t;
+-
+- out_poly8x8_t = vcreate_p8 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreates16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreates16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreates16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- uint64_t arg0_uint64_t;
+-
+- out_int16x4_t = vcreate_s16 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreates32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreates32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreates32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- uint64_t arg0_uint64_t;
+-
+- out_int32x2_t = vcreate_s32 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreates64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreates64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreates64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- uint64_t arg0_uint64_t;
+-
+- out_int64x1_t = vcreate_s64 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreates8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreates8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreates8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- uint64_t arg0_uint64_t;
+-
+- out_int8x8_t = vcreate_s8 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreateu16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreateu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreateu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint64_t arg0_uint64_t;
+-
+- out_uint16x4_t = vcreate_u16 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreateu32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreateu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreateu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64_t arg0_uint64_t;
+-
+- out_uint32x2_t = vcreate_u32 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreateu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreateu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreateu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64_t arg0_uint64_t;
+-
+- out_uint64x1_t = vcreate_u64 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcreateu8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vcreateu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcreateu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint64_t arg0_uint64_t;
+-
+- out_uint8x8_t = vcreate_u8 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtQ_nf32_s32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtQ_nf32_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtQ_nf32_s32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_float32x4_t = vcvtq_n_f32_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.f32.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtQ_nf32_u32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtQ_nf32_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtQ_nf32_u32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_float32x4_t = vcvtq_n_f32_u32 (arg0_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.f32.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtQ_ns32_f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtQ_ns32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtQ_ns32_f32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_int32x4_t = vcvtq_n_s32_f32 (arg0_float32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.s32.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtQ_nu32_f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtQ_nu32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtQ_nu32_f32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_uint32x4_t = vcvtq_n_u32_f32 (arg0_float32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.u32.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtQf32_s32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtQf32_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtQf32_s32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_float32x4_t = vcvtq_f32_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.f32.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtQf32_u32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtQf32_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtQf32_u32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_float32x4_t = vcvtq_f32_u32 (arg0_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.f32.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtQs32_f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtQs32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtQs32_f32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_int32x4_t = vcvtq_s32_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.s32.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtQu32_f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtQu32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtQu32_f32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_uint32x4_t = vcvtq_u32_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.u32.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvt_nf32_s32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvt_nf32_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvt_nf32_s32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_float32x2_t = vcvt_n_f32_s32 (arg0_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.f32.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvt_nf32_u32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvt_nf32_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvt_nf32_u32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_float32x2_t = vcvt_n_f32_u32 (arg0_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.f32.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvt_ns32_f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvt_ns32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvt_ns32_f32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_int32x2_t = vcvt_n_s32_f32 (arg0_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.s32.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvt_nu32_f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvt_nu32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvt_nu32_f32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_uint32x2_t = vcvt_n_u32_f32 (arg0_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.u32.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtf16_f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtf16_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_fp16_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon_fp16 } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtf16_f32 (void)
+-{
+- float16x4_t out_float16x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float16x4_t = vcvt_f16_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.f16.f32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtf32_f16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtf32_f16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_fp16_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon_fp16 } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtf32_f16 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float16x4_t arg0_float16x4_t;
+-
+- out_float32x4_t = vcvt_f32_f16 (arg0_float16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.f32.f16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtf32_s32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtf32_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtf32_s32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_float32x2_t = vcvt_f32_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.f32.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtf32_u32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtf32_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtf32_u32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_float32x2_t = vcvt_f32_u32 (arg0_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.f32.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvts32_f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvts32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvts32_f32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_int32x2_t = vcvt_s32_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.s32.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vcvtu32_f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vcvtu32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vcvtu32_f32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_uint32x2_t = vcvt_u32_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vcvt\.u32.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_lanef32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x4_t = vdupq_lane_f32 (arg0_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_lanep16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_poly16x8_t = vdupq_lane_p16 (arg0_poly16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_lanep64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdupQ_lanep64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_lanep64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_poly64x2_t = vdupq_lane_p64 (arg0_poly64x1_t, 0);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_lanep8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_poly8x16_t = vdupq_lane_p8 (arg0_poly8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_lanes16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x8_t = vdupq_lane_s16 (arg0_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_lanes32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x4_t = vdupq_lane_s32 (arg0_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_lanes64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdupQ_lanes64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_lanes64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_int64x2_t = vdupq_lane_s64 (arg0_int64x1_t, 0);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_lanes8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x16_t = vdupq_lane_s8 (arg0_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_laneu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16x8_t = vdupq_lane_u16 (arg0_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_laneu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x4_t = vdupq_lane_u32 (arg0_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_laneu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdupQ_laneu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_laneu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_uint64x2_t = vdupq_lane_u64 (arg0_uint64x1_t, 0);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_laneu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x16_t = vdupq_lane_u8 (arg0_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_nf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_nf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_nf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32_t arg0_float32_t;
+-
+- out_float32x4_t = vdupq_n_f32 (arg0_float32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_np16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_np16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_np16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16_t arg0_poly16_t;
+-
+- out_poly16x8_t = vdupq_n_p16 (arg0_poly16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_np64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdupQ_np64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_np64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- poly64_t arg0_poly64_t;
+-
+- out_poly64x2_t = vdupq_n_p64 (arg0_poly64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_np8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_np8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_np8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8_t arg0_poly8_t;
+-
+- out_poly8x16_t = vdupq_n_p8 (arg0_poly8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16_t arg0_int16_t;
+-
+- out_int16x8_t = vdupq_n_s16 (arg0_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32_t arg0_int32_t;
+-
+- out_int32x4_t = vdupq_n_s32 (arg0_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_ns64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdupQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_ns64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64_t arg0_int64_t;
+-
+- out_int64x2_t = vdupq_n_s64 (arg0_int64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_ns8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8_t arg0_int8_t;
+-
+- out_int8x16_t = vdupq_n_s8 (arg0_int8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16_t arg0_uint16_t;
+-
+- out_uint16x8_t = vdupq_n_u16 (arg0_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32_t arg0_uint32_t;
+-
+- out_uint32x4_t = vdupq_n_u32 (arg0_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_nu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdupQ_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_nu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64_t arg0_uint64_t;
+-
+- out_uint64x2_t = vdupq_n_u64 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdupQ_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdupQ_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdupQ_nu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8_t arg0_uint8_t;
+-
+- out_uint8x16_t = vdupq_n_u8 (arg0_uint8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_lanef32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vdup_lane_f32 (arg0_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_lanep16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_poly16x4_t = vdup_lane_p16 (arg0_poly16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_lanep64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdup_lanep64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_lanep64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_poly64x1_t = vdup_lane_p64 (arg0_poly64x1_t, 0);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_lanep8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_poly8x8_t = vdup_lane_p8 (arg0_poly8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_lanes16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vdup_lane_s16 (arg0_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_lanes32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vdup_lane_s32 (arg0_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_lanes64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdup_lanes64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_lanes64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_int64x1_t = vdup_lane_s64 (arg0_int64x1_t, 0);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_lanes8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vdup_lane_s8 (arg0_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_laneu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16x4_t = vdup_lane_u16 (arg0_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_laneu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x2_t = vdup_lane_u32 (arg0_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_laneu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdup_laneu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_laneu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_uint64x1_t = vdup_lane_u64 (arg0_uint64x1_t, 0);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_laneu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vdup_lane_u8 (arg0_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_nf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_nf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_nf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32_t arg0_float32_t;
+-
+- out_float32x2_t = vdup_n_f32 (arg0_float32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_np16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_np16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_np16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16_t arg0_poly16_t;
+-
+- out_poly16x4_t = vdup_n_p16 (arg0_poly16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_np64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdup_np64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_np64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- poly64_t arg0_poly64_t;
+-
+- out_poly64x1_t = vdup_n_p64 (arg0_poly64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_np8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_np8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_np8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8_t arg0_poly8_t;
+-
+- out_poly8x8_t = vdup_n_p8 (arg0_poly8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16_t arg0_int16_t;
+-
+- out_int16x4_t = vdup_n_s16 (arg0_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32_t arg0_int32_t;
+-
+- out_int32x2_t = vdup_n_s32 (arg0_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_ns64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdup_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_ns64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64_t arg0_int64_t;
+-
+- out_int64x1_t = vdup_n_s64 (arg0_int64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_ns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8_t arg0_int8_t;
+-
+- out_int8x8_t = vdup_n_s8 (arg0_int8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16_t arg0_uint16_t;
+-
+- out_uint16x4_t = vdup_n_u16 (arg0_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32_t arg0_uint32_t;
+-
+- out_uint32x2_t = vdup_n_u32 (arg0_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_nu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vdup_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_nu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64_t arg0_uint64_t;
+-
+- out_uint64x1_t = vdup_n_u64 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vdup_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vdup_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vdup_nu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8_t arg0_uint8_t;
+-
+- out_uint8x8_t = vdup_n_u8 (arg0_uint8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veorQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veorQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veorQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = veorq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veorQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veorQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veorQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = veorq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veorQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veorQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veorQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = veorq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veorQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veorQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veorQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = veorq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veorQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veorQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veorQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = veorq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veorQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veorQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veorQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = veorq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veorQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veorQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veorQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = veorq_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veorQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veorQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veorQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = veorq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veors16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veors16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veors16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = veor_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veors32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veors32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veors32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = veor_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veors64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `veors64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veors64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = veor_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veors8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veors8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veors8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = veor_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veoru16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veoru16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veoru16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = veor_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veoru32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veoru32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veoru32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = veor_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veoru64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `veoru64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veoru64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = veor_u64 (arg0_uint64x1_t, arg1_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/veoru8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `veoru8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_veoru8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = veor_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "veor\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vextq_f32 (arg0_float32x4_t, arg1_float32x4_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQp16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16x8_t arg0_poly16x8_t;
+- poly16x8_t arg1_poly16x8_t;
+-
+- out_poly16x8_t = vextq_p16 (arg0_poly16x8_t, arg1_poly16x8_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQp64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQp64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- poly64x2_t arg0_poly64x2_t;
+- poly64x2_t arg1_poly64x2_t;
+-
+- out_poly64x2_t = vextq_p64 (arg0_poly64x2_t, arg1_poly64x2_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQp8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_poly8x16_t = vextq_p8 (arg0_poly8x16_t, arg1_poly8x16_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vextq_s16 (arg0_int16x8_t, arg1_int16x8_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vextq_s32 (arg0_int32x4_t, arg1_int32x4_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vextq_s64 (arg0_int64x2_t, arg1_int64x2_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vextq_s8 (arg0_int8x16_t, arg1_int8x16_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vextq_u16 (arg0_uint16x8_t, arg1_uint16x8_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vextq_u32 (arg0_uint32x4_t, arg1_uint32x4_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vextq_u64 (arg0_uint64x2_t, arg1_uint64x2_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vextq_u8 (arg0_uint8x16_t, arg1_uint8x16_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vext_f32 (arg0_float32x2_t, arg1_float32x2_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextp16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16x4_t arg0_poly16x4_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- out_poly16x4_t = vext_p16 (arg0_poly16x4_t, arg1_poly16x4_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextp64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextp64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- poly64x1_t arg0_poly64x1_t;
+- poly64x1_t arg1_poly64x1_t;
+-
+- out_poly64x1_t = vext_p64 (arg0_poly64x1_t, arg1_poly64x1_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextp8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly8x8_t = vext_p8 (arg0_poly8x8_t, arg1_poly8x8_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vexts16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vexts16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vexts16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vext_s16 (arg0_int16x4_t, arg1_int16x4_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vexts32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vexts32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vexts32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vext_s32 (arg0_int32x2_t, arg1_int32x2_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vexts64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vexts64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vexts64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vext_s64 (arg0_int64x1_t, arg1_int64x1_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vexts8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vexts8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vexts8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vext_s8 (arg0_int8x8_t, arg1_int8x8_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vext_u16 (arg0_uint16x4_t, arg1_uint16x4_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vext_u32 (arg0_uint32x2_t, arg1_uint32x2_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vext_u64 (arg0_uint64x1_t, arg1_uint64x1_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vextu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vextu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vextu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vext_u8 (arg0_uint8x8_t, arg1_uint8x8_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vext\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vfmaQf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vfmaQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neonv2_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neonv2 } */
+-
+-#include "arm_neon.h"
+-
+-void test_vfmaQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+- float32x4_t arg2_float32x4_t;
+-
+- out_float32x4_t = vfmaq_f32 (arg0_float32x4_t, arg1_float32x4_t, arg2_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vfma\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vfmaf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vfmaf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neonv2_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neonv2 } */
+-
+-#include "arm_neon.h"
+-
+-void test_vfmaf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+- float32x2_t arg2_float32x2_t;
+-
+- out_float32x2_t = vfma_f32 (arg0_float32x2_t, arg1_float32x2_t, arg2_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vfma\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vfmsQf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vfmsQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neonv2_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neonv2 } */
+-
+-#include "arm_neon.h"
+-
+-void test_vfmsQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+- float32x4_t arg2_float32x4_t;
+-
+- out_float32x4_t = vfmsq_f32 (arg0_float32x4_t, arg1_float32x4_t, arg2_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vfms\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vfmsf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vfmsf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neonv2_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neonv2 } */
+-
+-#include "arm_neon.h"
+-
+-void test_vfmsf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+- float32x2_t arg2_float32x2_t;
+-
+- out_float32x2_t = vfms_f32 (arg0_float32x2_t, arg1_float32x2_t, arg2_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vfms\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_lanef32 (void)
+-{
+- float32_t out_float32_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32_t = vgetq_lane_f32 (arg0_float32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_lanep16 (void)
+-{
+- poly16_t out_poly16_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_poly16_t = vgetq_lane_p16 (arg0_poly16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.u16\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_lanep8 (void)
+-{
+- poly8_t out_poly8_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly8_t = vgetq_lane_p8 (arg0_poly8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.u8\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_lanes16 (void)
+-{
+- int16_t out_int16_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16_t = vgetq_lane_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.s16\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_lanes32 (void)
+-{
+- int32_t out_int32_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32_t = vgetq_lane_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_lanes64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_lanes64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_lanes64 (void)
+-{
+- register int64_t out_int64_t asm ("r0");
+- int64x2_t arg0_int64x2_t;
+-
+- out_int64_t = vgetq_lane_s64 (arg0_int64x2_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "((vmov)|(fmrrd))\[ \]+\[rR\]\[0-9\]+, \[rR\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_lanes8 (void)
+-{
+- int8_t out_int8_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8_t = vgetq_lane_s8 (arg0_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.s8\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_laneu16 (void)
+-{
+- uint16_t out_uint16_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16_t = vgetq_lane_u16 (arg0_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.u16\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_laneu32 (void)
+-{
+- uint32_t out_uint32_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32_t = vgetq_lane_u32 (arg0_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_laneu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_laneu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_laneu64 (void)
+-{
+- register uint64_t out_uint64_t asm ("r0");
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint64_t = vgetq_lane_u64 (arg0_uint64x2_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "((vmov)|(fmrrd))\[ \]+\[rR\]\[0-9\]+, \[rR\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vgetQ_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vgetQ_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vgetQ_laneu8 (void)
+-{
+- uint8_t out_uint8_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8_t = vgetq_lane_u8 (arg0_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.u8\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highf32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x2_t = vget_high_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highp16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highp16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_poly16x4_t = vget_high_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highp64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highp64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_poly64x1_t = vget_high_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highp8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highp8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly8x8_t = vget_high_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highs16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x4_t = vget_high_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highs32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x2_t = vget_high_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highs64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highs64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int64x1_t = vget_high_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highs8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highs8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x8_t = vget_high_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highu16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16x4_t = vget_high_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highu32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x2_t = vget_high_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint64x1_t = vget_high_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_highu8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_highu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_highu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x8_t = vget_high_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lanef32 (void)
+-{
+- float32_t out_float32_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32_t = vget_lane_f32 (arg0_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lanep16 (void)
+-{
+- poly16_t out_poly16_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_poly16_t = vget_lane_p16 (arg0_poly16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.u16\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lanep8 (void)
+-{
+- poly8_t out_poly8_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_poly8_t = vget_lane_p8 (arg0_poly8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.u8\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lanes16 (void)
+-{
+- int16_t out_int16_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16_t = vget_lane_s16 (arg0_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.s16\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lanes32 (void)
+-{
+- int32_t out_int32_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32_t = vget_lane_s32 (arg0_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lanes64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_lanes64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lanes64 (void)
+-{
+- int64_t out_int64_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_int64_t = vget_lane_s64 (arg0_int64x1_t, 0);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lanes8 (void)
+-{
+- int8_t out_int8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8_t = vget_lane_s8 (arg0_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.s8\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_laneu16 (void)
+-{
+- uint16_t out_uint16_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16_t = vget_lane_u16 (arg0_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.u16\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_laneu32 (void)
+-{
+- uint32_t out_uint32_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32_t = vget_lane_u32 (arg0_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_laneu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_laneu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_laneu64 (void)
+-{
+- uint64_t out_uint64_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_uint64_t = vget_lane_u64 (arg0_uint64x1_t, 0);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_laneu8 (void)
+-{
+- uint8_t out_uint8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8_t = vget_lane_u8 (arg0_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.u8\[ \]+\[rR\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lowf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lowf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lowf32 (void)
+-{
+- register float32x2_t out_float32x2_t asm ("d18");
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x2_t = vget_low_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lowp16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lowp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lowp16 (void)
+-{
+- register poly16x4_t out_poly16x4_t asm ("d18");
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_poly16x4_t = vget_low_p16 (arg0_poly16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lowp64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_lowp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lowp64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_poly64x1_t = vget_low_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lowp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lowp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lowp8 (void)
+-{
+- register poly8x8_t out_poly8x8_t asm ("d18");
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly8x8_t = vget_low_p8 (arg0_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lows16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lows16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lows16 (void)
+-{
+- register int16x4_t out_int16x4_t asm ("d18");
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x4_t = vget_low_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lows32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lows32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lows32 (void)
+-{
+- register int32x2_t out_int32x2_t asm ("d18");
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x2_t = vget_low_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lows64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_lows64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lows64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int64x1_t = vget_low_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lows8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lows8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lows8 (void)
+-{
+- register int8x8_t out_int8x8_t asm ("d18");
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x8_t = vget_low_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lowu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lowu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lowu16 (void)
+-{
+- register uint16x4_t out_uint16x4_t asm ("d18");
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16x4_t = vget_low_u16 (arg0_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lowu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lowu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lowu32 (void)
+-{
+- register uint32x2_t out_uint32x2_t asm ("d18");
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x2_t = vget_low_u32 (arg0_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lowu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vget_lowu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lowu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint64x1_t = vget_low_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vget_lowu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vget_lowu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vget_lowu8 (void)
+-{
+- register uint8x8_t out_uint8x8_t asm ("d18");
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x8_t = vget_low_u8 (arg0_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhaddQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhaddQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhaddQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vhaddq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhaddQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhaddQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhaddQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vhaddq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhaddQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhaddQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhaddQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vhaddq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhaddQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhaddQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhaddQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vhaddq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhaddQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhaddQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhaddQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vhaddq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhaddQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhaddQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhaddQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vhaddq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhadds16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhadds16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhadds16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vhadd_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhadds32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhadds32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhadds32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vhadd_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhadds8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhadds8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhadds8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vhadd_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhaddu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhaddu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhaddu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vhadd_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhaddu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhaddu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhaddu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vhadd_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhaddu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhaddu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhaddu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vhadd_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhadd\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vhsubq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vhsubq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vhsubq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vhsubq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vhsubq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vhsubq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vhsub_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vhsub_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubs8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vhsub_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vhsub_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vhsub_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vhsubu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vhsubu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vhsubu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vhsub_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vhsub\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupf32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dupf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dupf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+-
+- out_float32x4_t = vld1q_dup_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dupp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dupp16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+-
+- out_poly16x8_t = vld1q_dup_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dupp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dupp64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+-
+- out_poly64x2_t = vld1q_dup_p64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dupp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dupp8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+-
+- out_poly8x16_t = vld1q_dup_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dups16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dups16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dups16 (void)
+-{
+- int16x8_t out_int16x8_t;
+-
+- out_int16x8_t = vld1q_dup_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dups32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dups32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dups32 (void)
+-{
+- int32x4_t out_int32x4_t;
+-
+- out_int32x4_t = vld1q_dup_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dups64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dups64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dups64 (void)
+-{
+- int64x2_t out_int64x2_t;
+-
+- out_int64x2_t = vld1q_dup_s64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dups8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dups8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dups8 (void)
+-{
+- int8x16_t out_int8x16_t;
+-
+- out_int8x16_t = vld1q_dup_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupu16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dupu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dupu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+-
+- out_uint16x8_t = vld1q_dup_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupu32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dupu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dupu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+-
+- out_uint32x4_t = vld1q_dup_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dupu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dupu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+-
+- out_uint64x2_t = vld1q_dup_u64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupu8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Q_dupu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_dupu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+-
+- out_uint8x16_t = vld1q_dup_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_lanef32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vld1q_lane_f32 (0, arg1_float32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_lanep16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16x8_t arg1_poly16x8_t;
+-
+- out_poly16x8_t = vld1q_lane_p16 (0, arg1_poly16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_lanep64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_lanep64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_lanep64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- poly64x2_t arg1_poly64x2_t;
+-
+- out_poly64x2_t = vld1q_lane_p64 (0, arg1_poly64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_lanep8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_poly8x16_t = vld1q_lane_p8 (0, arg1_poly8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_lanes16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vld1q_lane_s16 (0, arg1_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_lanes32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vld1q_lane_s32 (0, arg1_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_lanes64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_lanes64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_lanes64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vld1q_lane_s64 (0, arg1_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_lanes8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vld1q_lane_s8 (0, arg1_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_laneu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vld1q_lane_u16 (0, arg1_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_laneu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vld1q_lane_u32 (0, arg1_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_laneu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_laneu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_laneu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vld1q_lane_u64 (0, arg1_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Q_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1Q_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Q_laneu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vld1q_lane_u8 (0, arg1_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qf32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+-
+- out_float32x4_t = vld1q_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qp16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qp16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+-
+- out_poly16x8_t = vld1q_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qp64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qp64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+-
+- out_poly64x2_t = vld1q_p64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qp8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qp8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+-
+- out_poly8x16_t = vld1q_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qs16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+-
+- out_int16x8_t = vld1q_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qs32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+-
+- out_int32x4_t = vld1q_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qs64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+-
+- out_int64x2_t = vld1q_s64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qs8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+-
+- out_int8x16_t = vld1q_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qu16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+-
+- out_uint16x8_t = vld1q_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qu32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+-
+- out_uint32x4_t = vld1q_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+-
+- out_uint64x2_t = vld1q_u64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1Qu8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1Qu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+-
+- out_uint8x16_t = vld1q_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dupf32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dupf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dupf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+-
+- out_float32x2_t = vld1_dup_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\\\]\\\})|(\[dD\]\[0-9\]+\\\[\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dupp16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dupp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dupp16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+-
+- out_poly16x4_t = vld1_dup_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\\\]\\\})|(\[dD\]\[0-9\]+\\\[\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dupp64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dupp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dupp64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+-
+- out_poly64x1_t = vld1_dup_p64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dupp8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dupp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dupp8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+-
+- out_poly8x8_t = vld1_dup_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\\\]\\\})|(\[dD\]\[0-9\]+\\\[\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dups16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dups16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dups16 (void)
+-{
+- int16x4_t out_int16x4_t;
+-
+- out_int16x4_t = vld1_dup_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\\\]\\\})|(\[dD\]\[0-9\]+\\\[\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dups32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dups32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dups32 (void)
+-{
+- int32x2_t out_int32x2_t;
+-
+- out_int32x2_t = vld1_dup_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\\\]\\\})|(\[dD\]\[0-9\]+\\\[\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dups64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dups64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dups64 (void)
+-{
+- int64x1_t out_int64x1_t;
+-
+- out_int64x1_t = vld1_dup_s64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dups8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dups8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dups8 (void)
+-{
+- int8x8_t out_int8x8_t;
+-
+- out_int8x8_t = vld1_dup_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\\\]\\\})|(\[dD\]\[0-9\]+\\\[\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dupu16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dupu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dupu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+-
+- out_uint16x4_t = vld1_dup_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\\\]\\\})|(\[dD\]\[0-9\]+\\\[\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dupu32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dupu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dupu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+-
+- out_uint32x2_t = vld1_dup_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\\\]\\\})|(\[dD\]\[0-9\]+\\\[\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dupu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dupu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dupu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+-
+- out_uint64x1_t = vld1_dup_u64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_dupu8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1_dupu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_dupu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+-
+- out_uint8x8_t = vld1_dup_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\\\]\\\})|(\[dD\]\[0-9\]+\\\[\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_lanef32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vld1_lane_f32 (0, arg1_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_lanep16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- out_poly16x4_t = vld1_lane_p16 (0, arg1_poly16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_lanep64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_lanep64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_lanep64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- poly64x1_t arg1_poly64x1_t;
+-
+- out_poly64x1_t = vld1_lane_p64 (0, arg1_poly64x1_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_lanep8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly8x8_t = vld1_lane_p8 (0, arg1_poly8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_lanes16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vld1_lane_s16 (0, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_lanes32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vld1_lane_s32 (0, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_lanes64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_lanes64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_lanes64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vld1_lane_s64 (0, arg1_int64x1_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_lanes8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vld1_lane_s8 (0, arg1_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_laneu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vld1_lane_u16 (0, arg1_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_laneu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vld1_lane_u32 (0, arg1_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_laneu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_laneu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_laneu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vld1_lane_u64 (0, arg1_uint64x1_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld1_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1_laneu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vld1_lane_u8 (0, arg1_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1f32 (void)
+-{
+- float32x2_t out_float32x2_t;
+-
+- out_float32x2_t = vld1_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1p16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+-
+- out_poly16x4_t = vld1_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1p64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+-
+- out_poly64x1_t = vld1_p64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+-
+- out_poly8x8_t = vld1_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1s16 (void)
+-{
+- int16x4_t out_int16x4_t;
+-
+- out_int16x4_t = vld1_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1s32 (void)
+-{
+- int32x2_t out_int32x2_t;
+-
+- out_int32x2_t = vld1_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1s64 (void)
+-{
+- int64x1_t out_int64x1_t;
+-
+- out_int64x1_t = vld1_s64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+-
+- out_int8x8_t = vld1_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1u16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+-
+- out_uint16x4_t = vld1_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1u32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+-
+- out_uint32x2_t = vld1_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1u64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+-
+- out_uint64x1_t = vld1_u64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld1u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld1u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld1u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+-
+- out_uint8x8_t = vld1_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Q_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Q_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Q_lanef32 (void)
+-{
+- float32x4x2_t out_float32x4x2_t;
+- float32x4x2_t arg1_float32x4x2_t;
+-
+- out_float32x4x2_t = vld2q_lane_f32 (0, arg1_float32x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Q_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Q_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Q_lanep16 (void)
+-{
+- poly16x8x2_t out_poly16x8x2_t;
+- poly16x8x2_t arg1_poly16x8x2_t;
+-
+- out_poly16x8x2_t = vld2q_lane_p16 (0, arg1_poly16x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Q_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Q_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Q_lanes16 (void)
+-{
+- int16x8x2_t out_int16x8x2_t;
+- int16x8x2_t arg1_int16x8x2_t;
+-
+- out_int16x8x2_t = vld2q_lane_s16 (0, arg1_int16x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Q_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Q_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Q_lanes32 (void)
+-{
+- int32x4x2_t out_int32x4x2_t;
+- int32x4x2_t arg1_int32x4x2_t;
+-
+- out_int32x4x2_t = vld2q_lane_s32 (0, arg1_int32x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Q_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Q_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Q_laneu16 (void)
+-{
+- uint16x8x2_t out_uint16x8x2_t;
+- uint16x8x2_t arg1_uint16x8x2_t;
+-
+- out_uint16x8x2_t = vld2q_lane_u16 (0, arg1_uint16x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Q_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Q_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Q_laneu32 (void)
+-{
+- uint32x4x2_t out_uint32x4x2_t;
+- uint32x4x2_t arg1_uint32x4x2_t;
+-
+- out_uint32x4x2_t = vld2q_lane_u32 (0, arg1_uint32x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Qf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Qf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Qf32 (void)
+-{
+- float32x4x2_t out_float32x4x2_t;
+-
+- out_float32x4x2_t = vld2q_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Qp16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Qp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Qp16 (void)
+-{
+- poly16x8x2_t out_poly16x8x2_t;
+-
+- out_poly16x8x2_t = vld2q_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Qp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Qp8 (void)
+-{
+- poly8x16x2_t out_poly8x16x2_t;
+-
+- out_poly8x16x2_t = vld2q_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Qs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Qs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Qs16 (void)
+-{
+- int16x8x2_t out_int16x8x2_t;
+-
+- out_int16x8x2_t = vld2q_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Qs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Qs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Qs32 (void)
+-{
+- int32x4x2_t out_int32x4x2_t;
+-
+- out_int32x4x2_t = vld2q_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Qs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Qs8 (void)
+-{
+- int8x16x2_t out_int8x16x2_t;
+-
+- out_int8x16x2_t = vld2q_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Qu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Qu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Qu16 (void)
+-{
+- uint16x8x2_t out_uint16x8x2_t;
+-
+- out_uint16x8x2_t = vld2q_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Qu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Qu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Qu32 (void)
+-{
+- uint32x4x2_t out_uint32x4x2_t;
+-
+- out_uint32x4x2_t = vld2q_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2Qu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2Qu8 (void)
+-{
+- uint8x16x2_t out_uint8x16x2_t;
+-
+- out_uint8x16x2_t = vld2q_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dupf32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dupf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dupf32 (void)
+-{
+- float32x2x2_t out_float32x2x2_t;
+-
+- out_float32x2x2_t = vld2_dup_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dupp16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dupp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dupp16 (void)
+-{
+- poly16x4x2_t out_poly16x4x2_t;
+-
+- out_poly16x4x2_t = vld2_dup_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dupp64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dupp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dupp64 (void)
+-{
+- poly64x1x2_t out_poly64x1x2_t;
+-
+- out_poly64x1x2_t = vld2_dup_p64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dupp8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dupp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dupp8 (void)
+-{
+- poly8x8x2_t out_poly8x8x2_t;
+-
+- out_poly8x8x2_t = vld2_dup_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dups16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dups16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dups16 (void)
+-{
+- int16x4x2_t out_int16x4x2_t;
+-
+- out_int16x4x2_t = vld2_dup_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dups32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dups32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dups32 (void)
+-{
+- int32x2x2_t out_int32x2x2_t;
+-
+- out_int32x2x2_t = vld2_dup_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dups64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dups64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dups64 (void)
+-{
+- int64x1x2_t out_int64x1x2_t;
+-
+- out_int64x1x2_t = vld2_dup_s64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dups8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dups8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dups8 (void)
+-{
+- int8x8x2_t out_int8x8x2_t;
+-
+- out_int8x8x2_t = vld2_dup_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dupu16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dupu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dupu16 (void)
+-{
+- uint16x4x2_t out_uint16x4x2_t;
+-
+- out_uint16x4x2_t = vld2_dup_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dupu32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dupu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dupu32 (void)
+-{
+- uint32x2x2_t out_uint32x2x2_t;
+-
+- out_uint32x2x2_t = vld2_dup_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dupu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dupu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dupu64 (void)
+-{
+- uint64x1x2_t out_uint64x1x2_t;
+-
+- out_uint64x1x2_t = vld2_dup_u64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_dupu8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2_dupu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_dupu8 (void)
+-{
+- uint8x8x2_t out_uint8x8x2_t;
+-
+- out_uint8x8x2_t = vld2_dup_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_lanef32 (void)
+-{
+- float32x2x2_t out_float32x2x2_t;
+- float32x2x2_t arg1_float32x2x2_t;
+-
+- out_float32x2x2_t = vld2_lane_f32 (0, arg1_float32x2x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_lanep16 (void)
+-{
+- poly16x4x2_t out_poly16x4x2_t;
+- poly16x4x2_t arg1_poly16x4x2_t;
+-
+- out_poly16x4x2_t = vld2_lane_p16 (0, arg1_poly16x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_lanep8 (void)
+-{
+- poly8x8x2_t out_poly8x8x2_t;
+- poly8x8x2_t arg1_poly8x8x2_t;
+-
+- out_poly8x8x2_t = vld2_lane_p8 (0, arg1_poly8x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_lanes16 (void)
+-{
+- int16x4x2_t out_int16x4x2_t;
+- int16x4x2_t arg1_int16x4x2_t;
+-
+- out_int16x4x2_t = vld2_lane_s16 (0, arg1_int16x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_lanes32 (void)
+-{
+- int32x2x2_t out_int32x2x2_t;
+- int32x2x2_t arg1_int32x2x2_t;
+-
+- out_int32x2x2_t = vld2_lane_s32 (0, arg1_int32x2x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_lanes8 (void)
+-{
+- int8x8x2_t out_int8x8x2_t;
+- int8x8x2_t arg1_int8x8x2_t;
+-
+- out_int8x8x2_t = vld2_lane_s8 (0, arg1_int8x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_laneu16 (void)
+-{
+- uint16x4x2_t out_uint16x4x2_t;
+- uint16x4x2_t arg1_uint16x4x2_t;
+-
+- out_uint16x4x2_t = vld2_lane_u16 (0, arg1_uint16x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_laneu32 (void)
+-{
+- uint32x2x2_t out_uint32x2x2_t;
+- uint32x2x2_t arg1_uint32x2x2_t;
+-
+- out_uint32x2x2_t = vld2_lane_u32 (0, arg1_uint32x2x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld2_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2_laneu8 (void)
+-{
+- uint8x8x2_t out_uint8x8x2_t;
+- uint8x8x2_t arg1_uint8x8x2_t;
+-
+- out_uint8x8x2_t = vld2_lane_u8 (0, arg1_uint8x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2f32 (void)
+-{
+- float32x2x2_t out_float32x2x2_t;
+-
+- out_float32x2x2_t = vld2_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2p16 (void)
+-{
+- poly16x4x2_t out_poly16x4x2_t;
+-
+- out_poly16x4x2_t = vld2_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2p64 (void)
+-{
+- poly64x1x2_t out_poly64x1x2_t;
+-
+- out_poly64x1x2_t = vld2_p64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2p8 (void)
+-{
+- poly8x8x2_t out_poly8x8x2_t;
+-
+- out_poly8x8x2_t = vld2_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2s16 (void)
+-{
+- int16x4x2_t out_int16x4x2_t;
+-
+- out_int16x4x2_t = vld2_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2s32 (void)
+-{
+- int32x2x2_t out_int32x2x2_t;
+-
+- out_int32x2x2_t = vld2_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2s64 (void)
+-{
+- int64x1x2_t out_int64x1x2_t;
+-
+- out_int64x1x2_t = vld2_s64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2s8 (void)
+-{
+- int8x8x2_t out_int8x8x2_t;
+-
+- out_int8x8x2_t = vld2_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2u16 (void)
+-{
+- uint16x4x2_t out_uint16x4x2_t;
+-
+- out_uint16x4x2_t = vld2_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2u32 (void)
+-{
+- uint32x2x2_t out_uint32x2x2_t;
+-
+- out_uint32x2x2_t = vld2_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2u64 (void)
+-{
+- uint64x1x2_t out_uint64x1x2_t;
+-
+- out_uint64x1x2_t = vld2_u64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld2u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld2u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld2u8 (void)
+-{
+- uint8x8x2_t out_uint8x8x2_t;
+-
+- out_uint8x8x2_t = vld2_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Q_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Q_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Q_lanef32 (void)
+-{
+- float32x4x3_t out_float32x4x3_t;
+- float32x4x3_t arg1_float32x4x3_t;
+-
+- out_float32x4x3_t = vld3q_lane_f32 (0, arg1_float32x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Q_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Q_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Q_lanep16 (void)
+-{
+- poly16x8x3_t out_poly16x8x3_t;
+- poly16x8x3_t arg1_poly16x8x3_t;
+-
+- out_poly16x8x3_t = vld3q_lane_p16 (0, arg1_poly16x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Q_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Q_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Q_lanes16 (void)
+-{
+- int16x8x3_t out_int16x8x3_t;
+- int16x8x3_t arg1_int16x8x3_t;
+-
+- out_int16x8x3_t = vld3q_lane_s16 (0, arg1_int16x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Q_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Q_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Q_lanes32 (void)
+-{
+- int32x4x3_t out_int32x4x3_t;
+- int32x4x3_t arg1_int32x4x3_t;
+-
+- out_int32x4x3_t = vld3q_lane_s32 (0, arg1_int32x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Q_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Q_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Q_laneu16 (void)
+-{
+- uint16x8x3_t out_uint16x8x3_t;
+- uint16x8x3_t arg1_uint16x8x3_t;
+-
+- out_uint16x8x3_t = vld3q_lane_u16 (0, arg1_uint16x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Q_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Q_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Q_laneu32 (void)
+-{
+- uint32x4x3_t out_uint32x4x3_t;
+- uint32x4x3_t arg1_uint32x4x3_t;
+-
+- out_uint32x4x3_t = vld3q_lane_u32 (0, arg1_uint32x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Qf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Qf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Qf32 (void)
+-{
+- float32x4x3_t out_float32x4x3_t;
+-
+- out_float32x4x3_t = vld3q_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Qp16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Qp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Qp16 (void)
+-{
+- poly16x8x3_t out_poly16x8x3_t;
+-
+- out_poly16x8x3_t = vld3q_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Qp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Qp8 (void)
+-{
+- poly8x16x3_t out_poly8x16x3_t;
+-
+- out_poly8x16x3_t = vld3q_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Qs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Qs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Qs16 (void)
+-{
+- int16x8x3_t out_int16x8x3_t;
+-
+- out_int16x8x3_t = vld3q_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Qs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Qs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Qs32 (void)
+-{
+- int32x4x3_t out_int32x4x3_t;
+-
+- out_int32x4x3_t = vld3q_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Qs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Qs8 (void)
+-{
+- int8x16x3_t out_int8x16x3_t;
+-
+- out_int8x16x3_t = vld3q_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Qu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Qu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Qu16 (void)
+-{
+- uint16x8x3_t out_uint16x8x3_t;
+-
+- out_uint16x8x3_t = vld3q_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Qu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Qu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Qu32 (void)
+-{
+- uint32x4x3_t out_uint32x4x3_t;
+-
+- out_uint32x4x3_t = vld3q_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3Qu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3Qu8 (void)
+-{
+- uint8x16x3_t out_uint8x16x3_t;
+-
+- out_uint8x16x3_t = vld3q_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dupf32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dupf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dupf32 (void)
+-{
+- float32x2x3_t out_float32x2x3_t;
+-
+- out_float32x2x3_t = vld3_dup_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dupp16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dupp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dupp16 (void)
+-{
+- poly16x4x3_t out_poly16x4x3_t;
+-
+- out_poly16x4x3_t = vld3_dup_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dupp64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dupp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dupp64 (void)
+-{
+- poly64x1x3_t out_poly64x1x3_t;
+-
+- out_poly64x1x3_t = vld3_dup_p64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dupp8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dupp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dupp8 (void)
+-{
+- poly8x8x3_t out_poly8x8x3_t;
+-
+- out_poly8x8x3_t = vld3_dup_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dups16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dups16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dups16 (void)
+-{
+- int16x4x3_t out_int16x4x3_t;
+-
+- out_int16x4x3_t = vld3_dup_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dups32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dups32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dups32 (void)
+-{
+- int32x2x3_t out_int32x2x3_t;
+-
+- out_int32x2x3_t = vld3_dup_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dups64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dups64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dups64 (void)
+-{
+- int64x1x3_t out_int64x1x3_t;
+-
+- out_int64x1x3_t = vld3_dup_s64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dups8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dups8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dups8 (void)
+-{
+- int8x8x3_t out_int8x8x3_t;
+-
+- out_int8x8x3_t = vld3_dup_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dupu16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dupu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dupu16 (void)
+-{
+- uint16x4x3_t out_uint16x4x3_t;
+-
+- out_uint16x4x3_t = vld3_dup_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dupu32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dupu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dupu32 (void)
+-{
+- uint32x2x3_t out_uint32x2x3_t;
+-
+- out_uint32x2x3_t = vld3_dup_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dupu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dupu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dupu64 (void)
+-{
+- uint64x1x3_t out_uint64x1x3_t;
+-
+- out_uint64x1x3_t = vld3_dup_u64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_dupu8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3_dupu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_dupu8 (void)
+-{
+- uint8x8x3_t out_uint8x8x3_t;
+-
+- out_uint8x8x3_t = vld3_dup_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_lanef32 (void)
+-{
+- float32x2x3_t out_float32x2x3_t;
+- float32x2x3_t arg1_float32x2x3_t;
+-
+- out_float32x2x3_t = vld3_lane_f32 (0, arg1_float32x2x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_lanep16 (void)
+-{
+- poly16x4x3_t out_poly16x4x3_t;
+- poly16x4x3_t arg1_poly16x4x3_t;
+-
+- out_poly16x4x3_t = vld3_lane_p16 (0, arg1_poly16x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_lanep8 (void)
+-{
+- poly8x8x3_t out_poly8x8x3_t;
+- poly8x8x3_t arg1_poly8x8x3_t;
+-
+- out_poly8x8x3_t = vld3_lane_p8 (0, arg1_poly8x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_lanes16 (void)
+-{
+- int16x4x3_t out_int16x4x3_t;
+- int16x4x3_t arg1_int16x4x3_t;
+-
+- out_int16x4x3_t = vld3_lane_s16 (0, arg1_int16x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_lanes32 (void)
+-{
+- int32x2x3_t out_int32x2x3_t;
+- int32x2x3_t arg1_int32x2x3_t;
+-
+- out_int32x2x3_t = vld3_lane_s32 (0, arg1_int32x2x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_lanes8 (void)
+-{
+- int8x8x3_t out_int8x8x3_t;
+- int8x8x3_t arg1_int8x8x3_t;
+-
+- out_int8x8x3_t = vld3_lane_s8 (0, arg1_int8x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_laneu16 (void)
+-{
+- uint16x4x3_t out_uint16x4x3_t;
+- uint16x4x3_t arg1_uint16x4x3_t;
+-
+- out_uint16x4x3_t = vld3_lane_u16 (0, arg1_uint16x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_laneu32 (void)
+-{
+- uint32x2x3_t out_uint32x2x3_t;
+- uint32x2x3_t arg1_uint32x2x3_t;
+-
+- out_uint32x2x3_t = vld3_lane_u32 (0, arg1_uint32x2x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld3_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3_laneu8 (void)
+-{
+- uint8x8x3_t out_uint8x8x3_t;
+- uint8x8x3_t arg1_uint8x8x3_t;
+-
+- out_uint8x8x3_t = vld3_lane_u8 (0, arg1_uint8x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3f32 (void)
+-{
+- float32x2x3_t out_float32x2x3_t;
+-
+- out_float32x2x3_t = vld3_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3p16 (void)
+-{
+- poly16x4x3_t out_poly16x4x3_t;
+-
+- out_poly16x4x3_t = vld3_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3p64 (void)
+-{
+- poly64x1x3_t out_poly64x1x3_t;
+-
+- out_poly64x1x3_t = vld3_p64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3p8 (void)
+-{
+- poly8x8x3_t out_poly8x8x3_t;
+-
+- out_poly8x8x3_t = vld3_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3s16 (void)
+-{
+- int16x4x3_t out_int16x4x3_t;
+-
+- out_int16x4x3_t = vld3_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3s32 (void)
+-{
+- int32x2x3_t out_int32x2x3_t;
+-
+- out_int32x2x3_t = vld3_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3s64 (void)
+-{
+- int64x1x3_t out_int64x1x3_t;
+-
+- out_int64x1x3_t = vld3_s64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3s8 (void)
+-{
+- int8x8x3_t out_int8x8x3_t;
+-
+- out_int8x8x3_t = vld3_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3u16 (void)
+-{
+- uint16x4x3_t out_uint16x4x3_t;
+-
+- out_uint16x4x3_t = vld3_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3u32 (void)
+-{
+- uint32x2x3_t out_uint32x2x3_t;
+-
+- out_uint32x2x3_t = vld3_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3u64 (void)
+-{
+- uint64x1x3_t out_uint64x1x3_t;
+-
+- out_uint64x1x3_t = vld3_u64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld3u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld3u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld3u8 (void)
+-{
+- uint8x8x3_t out_uint8x8x3_t;
+-
+- out_uint8x8x3_t = vld3_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Q_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Q_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Q_lanef32 (void)
+-{
+- float32x4x4_t out_float32x4x4_t;
+- float32x4x4_t arg1_float32x4x4_t;
+-
+- out_float32x4x4_t = vld4q_lane_f32 (0, arg1_float32x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Q_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Q_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Q_lanep16 (void)
+-{
+- poly16x8x4_t out_poly16x8x4_t;
+- poly16x8x4_t arg1_poly16x8x4_t;
+-
+- out_poly16x8x4_t = vld4q_lane_p16 (0, arg1_poly16x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Q_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Q_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Q_lanes16 (void)
+-{
+- int16x8x4_t out_int16x8x4_t;
+- int16x8x4_t arg1_int16x8x4_t;
+-
+- out_int16x8x4_t = vld4q_lane_s16 (0, arg1_int16x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Q_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Q_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Q_lanes32 (void)
+-{
+- int32x4x4_t out_int32x4x4_t;
+- int32x4x4_t arg1_int32x4x4_t;
+-
+- out_int32x4x4_t = vld4q_lane_s32 (0, arg1_int32x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Q_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Q_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Q_laneu16 (void)
+-{
+- uint16x8x4_t out_uint16x8x4_t;
+- uint16x8x4_t arg1_uint16x8x4_t;
+-
+- out_uint16x8x4_t = vld4q_lane_u16 (0, arg1_uint16x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Q_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Q_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Q_laneu32 (void)
+-{
+- uint32x4x4_t out_uint32x4x4_t;
+- uint32x4x4_t arg1_uint32x4x4_t;
+-
+- out_uint32x4x4_t = vld4q_lane_u32 (0, arg1_uint32x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Qf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Qf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Qf32 (void)
+-{
+- float32x4x4_t out_float32x4x4_t;
+-
+- out_float32x4x4_t = vld4q_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Qp16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Qp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Qp16 (void)
+-{
+- poly16x8x4_t out_poly16x8x4_t;
+-
+- out_poly16x8x4_t = vld4q_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Qp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Qp8 (void)
+-{
+- poly8x16x4_t out_poly8x16x4_t;
+-
+- out_poly8x16x4_t = vld4q_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Qs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Qs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Qs16 (void)
+-{
+- int16x8x4_t out_int16x8x4_t;
+-
+- out_int16x8x4_t = vld4q_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Qs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Qs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Qs32 (void)
+-{
+- int32x4x4_t out_int32x4x4_t;
+-
+- out_int32x4x4_t = vld4q_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Qs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Qs8 (void)
+-{
+- int8x16x4_t out_int8x16x4_t;
+-
+- out_int8x16x4_t = vld4q_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Qu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Qu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Qu16 (void)
+-{
+- uint16x8x4_t out_uint16x8x4_t;
+-
+- out_uint16x8x4_t = vld4q_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Qu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Qu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Qu32 (void)
+-{
+- uint32x4x4_t out_uint32x4x4_t;
+-
+- out_uint32x4x4_t = vld4q_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4Qu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4Qu8 (void)
+-{
+- uint8x16x4_t out_uint8x16x4_t;
+-
+- out_uint8x16x4_t = vld4q_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dupf32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dupf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dupf32 (void)
+-{
+- float32x2x4_t out_float32x2x4_t;
+-
+- out_float32x2x4_t = vld4_dup_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dupp16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dupp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dupp16 (void)
+-{
+- poly16x4x4_t out_poly16x4x4_t;
+-
+- out_poly16x4x4_t = vld4_dup_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dupp64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dupp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dupp64 (void)
+-{
+- poly64x1x4_t out_poly64x1x4_t;
+-
+- out_poly64x1x4_t = vld4_dup_p64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dupp8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dupp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dupp8 (void)
+-{
+- poly8x8x4_t out_poly8x8x4_t;
+-
+- out_poly8x8x4_t = vld4_dup_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dups16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dups16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dups16 (void)
+-{
+- int16x4x4_t out_int16x4x4_t;
+-
+- out_int16x4x4_t = vld4_dup_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dups32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dups32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dups32 (void)
+-{
+- int32x2x4_t out_int32x2x4_t;
+-
+- out_int32x2x4_t = vld4_dup_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dups64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dups64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dups64 (void)
+-{
+- int64x1x4_t out_int64x1x4_t;
+-
+- out_int64x1x4_t = vld4_dup_s64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dups8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dups8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dups8 (void)
+-{
+- int8x8x4_t out_int8x8x4_t;
+-
+- out_int8x8x4_t = vld4_dup_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dupu16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dupu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dupu16 (void)
+-{
+- uint16x4x4_t out_uint16x4x4_t;
+-
+- out_uint16x4x4_t = vld4_dup_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dupu32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dupu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dupu32 (void)
+-{
+- uint32x2x4_t out_uint32x2x4_t;
+-
+- out_uint32x2x4_t = vld4_dup_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dupu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dupu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dupu64 (void)
+-{
+- uint64x1x4_t out_uint64x1x4_t;
+-
+- out_uint64x1x4_t = vld4_dup_u64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_dupu8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4_dupu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_dupu8 (void)
+-{
+- uint8x8x4_t out_uint8x8x4_t;
+-
+- out_uint8x8x4_t = vld4_dup_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\\\]-\[dD\]\[0-9\]+\\\[\\\])|(\[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\], \[dD\]\[0-9\]+\\\[\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_lanef32 (void)
+-{
+- float32x2x4_t out_float32x2x4_t;
+- float32x2x4_t arg1_float32x2x4_t;
+-
+- out_float32x2x4_t = vld4_lane_f32 (0, arg1_float32x2x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_lanep16 (void)
+-{
+- poly16x4x4_t out_poly16x4x4_t;
+- poly16x4x4_t arg1_poly16x4x4_t;
+-
+- out_poly16x4x4_t = vld4_lane_p16 (0, arg1_poly16x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_lanep8 (void)
+-{
+- poly8x8x4_t out_poly8x8x4_t;
+- poly8x8x4_t arg1_poly8x8x4_t;
+-
+- out_poly8x8x4_t = vld4_lane_p8 (0, arg1_poly8x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_lanes16 (void)
+-{
+- int16x4x4_t out_int16x4x4_t;
+- int16x4x4_t arg1_int16x4x4_t;
+-
+- out_int16x4x4_t = vld4_lane_s16 (0, arg1_int16x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_lanes32 (void)
+-{
+- int32x2x4_t out_int32x2x4_t;
+- int32x2x4_t arg1_int32x2x4_t;
+-
+- out_int32x2x4_t = vld4_lane_s32 (0, arg1_int32x2x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_lanes8 (void)
+-{
+- int8x8x4_t out_int8x8x4_t;
+- int8x8x4_t arg1_int8x8x4_t;
+-
+- out_int8x8x4_t = vld4_lane_s8 (0, arg1_int8x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_laneu16 (void)
+-{
+- uint16x4x4_t out_uint16x4x4_t;
+- uint16x4x4_t arg1_uint16x4x4_t;
+-
+- out_uint16x4x4_t = vld4_lane_u16 (0, arg1_uint16x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_laneu32 (void)
+-{
+- uint32x2x4_t out_uint32x2x4_t;
+- uint32x2x4_t arg1_uint32x2x4_t;
+-
+- out_uint32x2x4_t = vld4_lane_u32 (0, arg1_uint32x2x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vld4_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4_laneu8 (void)
+-{
+- uint8x8x4_t out_uint8x8x4_t;
+- uint8x8x4_t arg1_uint8x8x4_t;
+-
+- out_uint8x8x4_t = vld4_lane_u8 (0, arg1_uint8x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4f32 (void)
+-{
+- float32x2x4_t out_float32x2x4_t;
+-
+- out_float32x2x4_t = vld4_f32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4p16 (void)
+-{
+- poly16x4x4_t out_poly16x4x4_t;
+-
+- out_poly16x4x4_t = vld4_p16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4p64 (void)
+-{
+- poly64x1x4_t out_poly64x1x4_t;
+-
+- out_poly64x1x4_t = vld4_p64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4p8 (void)
+-{
+- poly8x8x4_t out_poly8x8x4_t;
+-
+- out_poly8x8x4_t = vld4_p8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4s16 (void)
+-{
+- int16x4x4_t out_int16x4x4_t;
+-
+- out_int16x4x4_t = vld4_s16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4s32 (void)
+-{
+- int32x2x4_t out_int32x2x4_t;
+-
+- out_int32x2x4_t = vld4_s32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4s64 (void)
+-{
+- int64x1x4_t out_int64x1x4_t;
+-
+- out_int64x1x4_t = vld4_s64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4s8 (void)
+-{
+- int8x8x4_t out_int8x8x4_t;
+-
+- out_int8x8x4_t = vld4_s8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4u16 (void)
+-{
+- uint16x4x4_t out_uint16x4x4_t;
+-
+- out_uint16x4x4_t = vld4_u16 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4u32 (void)
+-{
+- uint32x2x4_t out_uint32x2x4_t;
+-
+- out_uint32x2x4_t = vld4_u32 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4u64 (void)
+-{
+- uint64x1x4_t out_uint64x1x4_t;
+-
+- out_uint64x1x4_t = vld4_u64 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vld4u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vld4u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vld4u8 (void)
+-{
+- uint8x8x4_t out_uint8x8x4_t;
+-
+- out_uint8x8x4_t = vld4_u8 (0);
+-}
+-
+-/* { dg-final { scan-assembler "vld4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vmaxq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vmaxq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vmaxq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vmaxq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vmaxq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vmaxq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vmaxq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vmax_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vmax_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vmax_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxs8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vmax_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vmax_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vmax_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmaxu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmaxu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmaxu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vmax_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmax\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vminq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vminq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vminq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vminq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vminq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vminq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vminq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vmin_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmins16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmins16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmins16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vmin_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmins32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmins32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmins32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vmin_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmins8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmins8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmins8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vmin_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vmin_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vmin_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vminu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vminu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vminu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vmin_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmin\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQ_lanef32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQ_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQ_lanef32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+- float32x2_t arg2_float32x2_t;
+-
+- out_float32x4_t = vmlaq_lane_f32 (arg0_float32x4_t, arg1_float32x4_t, arg2_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQ_lanes16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQ_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQ_lanes16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int16x8_t = vmlaq_lane_s16 (arg0_int16x8_t, arg1_int16x8_t, arg2_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQ_lanes32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQ_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQ_lanes32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int32x4_t = vmlaq_lane_s32 (arg0_int32x4_t, arg1_int32x4_t, arg2_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQ_laneu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQ_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQ_laneu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint16x8_t = vmlaq_lane_u16 (arg0_uint16x8_t, arg1_uint16x8_t, arg2_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQ_laneu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQ_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQ_laneu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint32x4_t = vmlaq_lane_u32 (arg0_uint32x4_t, arg1_uint32x4_t, arg2_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQ_nf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQ_nf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQ_nf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+- float32_t arg2_float32_t;
+-
+- out_float32x4_t = vmlaq_n_f32 (arg0_float32x4_t, arg1_float32x4_t, arg2_float32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQ_ns16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+- int16_t arg2_int16_t;
+-
+- out_int16x8_t = vmlaq_n_s16 (arg0_int16x8_t, arg1_int16x8_t, arg2_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQ_ns32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+- int32_t arg2_int32_t;
+-
+- out_int32x4_t = vmlaq_n_s32 (arg0_int32x4_t, arg1_int32x4_t, arg2_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQ_nu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+- uint16_t arg2_uint16_t;
+-
+- out_uint16x8_t = vmlaq_n_u16 (arg0_uint16x8_t, arg1_uint16x8_t, arg2_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQ_nu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+- uint32_t arg2_uint32_t;
+-
+- out_uint32x4_t = vmlaq_n_u32 (arg0_uint32x4_t, arg1_uint32x4_t, arg2_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+- float32x4_t arg2_float32x4_t;
+-
+- out_float32x4_t = vmlaq_f32 (arg0_float32x4_t, arg1_float32x4_t, arg2_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQs16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+- int16x8_t arg2_int16x8_t;
+-
+- out_int16x8_t = vmlaq_s16 (arg0_int16x8_t, arg1_int16x8_t, arg2_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQs32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+- int32x4_t arg2_int32x4_t;
+-
+- out_int32x4_t = vmlaq_s32 (arg0_int32x4_t, arg1_int32x4_t, arg2_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQs8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+- int8x16_t arg2_int8x16_t;
+-
+- out_int8x16_t = vmlaq_s8 (arg0_int8x16_t, arg1_int8x16_t, arg2_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+- uint16x8_t arg2_uint16x8_t;
+-
+- out_uint16x8_t = vmlaq_u16 (arg0_uint16x8_t, arg1_uint16x8_t, arg2_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+- uint32x4_t arg2_uint32x4_t;
+-
+- out_uint32x4_t = vmlaq_u32 (arg0_uint32x4_t, arg1_uint32x4_t, arg2_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaQu8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+- uint8x16_t arg2_uint8x16_t;
+-
+- out_uint8x16_t = vmlaq_u8 (arg0_uint8x16_t, arg1_uint8x16_t, arg2_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmla_lanef32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmla_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmla_lanef32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+- float32x2_t arg2_float32x2_t;
+-
+- out_float32x2_t = vmla_lane_f32 (arg0_float32x2_t, arg1_float32x2_t, arg2_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmla_lanes16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmla_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmla_lanes16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int16x4_t = vmla_lane_s16 (arg0_int16x4_t, arg1_int16x4_t, arg2_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmla_lanes32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmla_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmla_lanes32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int32x2_t = vmla_lane_s32 (arg0_int32x2_t, arg1_int32x2_t, arg2_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmla_laneu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmla_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmla_laneu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint16x4_t = vmla_lane_u16 (arg0_uint16x4_t, arg1_uint16x4_t, arg2_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmla_laneu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmla_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmla_laneu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint32x2_t = vmla_lane_u32 (arg0_uint32x2_t, arg1_uint32x2_t, arg2_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmla_nf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmla_nf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmla_nf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+- float32_t arg2_float32_t;
+-
+- out_float32x2_t = vmla_n_f32 (arg0_float32x2_t, arg1_float32x2_t, arg2_float32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmla_ns16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmla_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmla_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16_t arg2_int16_t;
+-
+- out_int16x4_t = vmla_n_s16 (arg0_int16x4_t, arg1_int16x4_t, arg2_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmla_ns32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmla_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmla_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32_t arg2_int32_t;
+-
+- out_int32x2_t = vmla_n_s32 (arg0_int32x2_t, arg1_int32x2_t, arg2_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmla_nu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmla_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmla_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16_t arg2_uint16_t;
+-
+- out_uint16x4_t = vmla_n_u16 (arg0_uint16x4_t, arg1_uint16x4_t, arg2_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmla_nu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmla_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmla_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32_t arg2_uint32_t;
+-
+- out_uint32x2_t = vmla_n_u32 (arg0_uint32x2_t, arg1_uint32x2_t, arg2_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlaf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlaf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlaf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+- float32x2_t arg2_float32x2_t;
+-
+- out_float32x2_t = vmla_f32 (arg0_float32x2_t, arg1_float32x2_t, arg2_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlal_lanes16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlal_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlal_lanes16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int32x4_t = vmlal_lane_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlal_lanes32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlal_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlal_lanes32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int64x2_t = vmlal_lane_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlal_laneu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlal_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlal_laneu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint32x4_t = vmlal_lane_u16 (arg0_uint32x4_t, arg1_uint16x4_t, arg2_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlal_laneu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlal_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlal_laneu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint64x2_t = vmlal_lane_u32 (arg0_uint64x2_t, arg1_uint32x2_t, arg2_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlal_ns16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlal_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlal_ns16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16_t arg2_int16_t;
+-
+- out_int32x4_t = vmlal_n_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlal_ns32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlal_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlal_ns32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32_t arg2_int32_t;
+-
+- out_int64x2_t = vmlal_n_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlal_nu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlal_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlal_nu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16_t arg2_uint16_t;
+-
+- out_uint32x4_t = vmlal_n_u16 (arg0_uint32x4_t, arg1_uint16x4_t, arg2_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlal_nu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlal_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlal_nu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32_t arg2_uint32_t;
+-
+- out_uint64x2_t = vmlal_n_u32 (arg0_uint64x2_t, arg1_uint32x2_t, arg2_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlals16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlals16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlals16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int32x4_t = vmlal_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlals32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlals32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlals32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int64x2_t = vmlal_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlals8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlals8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlals8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int8x8_t arg1_int8x8_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int16x8_t = vmlal_s8 (arg0_int16x8_t, arg1_int8x8_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.s8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlalu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlalu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlalu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint32x4_t = vmlal_u16 (arg0_uint32x4_t, arg1_uint16x4_t, arg2_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlalu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlalu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlalu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint64x2_t = vmlal_u32 (arg0_uint64x2_t, arg1_uint32x2_t, arg2_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlalu8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlalu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlalu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint8x8_t arg1_uint8x8_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint16x8_t = vmlal_u8 (arg0_uint16x8_t, arg1_uint8x8_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlal\.u8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlas16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlas16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlas16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int16x4_t = vmla_s16 (arg0_int16x4_t, arg1_int16x4_t, arg2_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlas32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlas32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlas32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int32x2_t = vmla_s32 (arg0_int32x2_t, arg1_int32x2_t, arg2_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlas8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlas8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlas8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int8x8_t = vmla_s8 (arg0_int8x8_t, arg1_int8x8_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlau16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlau16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlau16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint16x4_t = vmla_u16 (arg0_uint16x4_t, arg1_uint16x4_t, arg2_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlau32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlau32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlau32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint32x2_t = vmla_u32 (arg0_uint32x2_t, arg1_uint32x2_t, arg2_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlau8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlau8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlau8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint8x8_t = vmla_u8 (arg0_uint8x8_t, arg1_uint8x8_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmla\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQ_lanef32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQ_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQ_lanef32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+- float32x2_t arg2_float32x2_t;
+-
+- out_float32x4_t = vmlsq_lane_f32 (arg0_float32x4_t, arg1_float32x4_t, arg2_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQ_lanes16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQ_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQ_lanes16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int16x8_t = vmlsq_lane_s16 (arg0_int16x8_t, arg1_int16x8_t, arg2_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQ_lanes32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQ_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQ_lanes32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int32x4_t = vmlsq_lane_s32 (arg0_int32x4_t, arg1_int32x4_t, arg2_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQ_laneu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQ_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQ_laneu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint16x8_t = vmlsq_lane_u16 (arg0_uint16x8_t, arg1_uint16x8_t, arg2_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQ_laneu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQ_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQ_laneu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint32x4_t = vmlsq_lane_u32 (arg0_uint32x4_t, arg1_uint32x4_t, arg2_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQ_nf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQ_nf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQ_nf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+- float32_t arg2_float32_t;
+-
+- out_float32x4_t = vmlsq_n_f32 (arg0_float32x4_t, arg1_float32x4_t, arg2_float32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQ_ns16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+- int16_t arg2_int16_t;
+-
+- out_int16x8_t = vmlsq_n_s16 (arg0_int16x8_t, arg1_int16x8_t, arg2_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQ_ns32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+- int32_t arg2_int32_t;
+-
+- out_int32x4_t = vmlsq_n_s32 (arg0_int32x4_t, arg1_int32x4_t, arg2_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQ_nu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+- uint16_t arg2_uint16_t;
+-
+- out_uint16x8_t = vmlsq_n_u16 (arg0_uint16x8_t, arg1_uint16x8_t, arg2_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQ_nu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+- uint32_t arg2_uint32_t;
+-
+- out_uint32x4_t = vmlsq_n_u32 (arg0_uint32x4_t, arg1_uint32x4_t, arg2_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+- float32x4_t arg2_float32x4_t;
+-
+- out_float32x4_t = vmlsq_f32 (arg0_float32x4_t, arg1_float32x4_t, arg2_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQs16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+- int16x8_t arg2_int16x8_t;
+-
+- out_int16x8_t = vmlsq_s16 (arg0_int16x8_t, arg1_int16x8_t, arg2_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQs32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+- int32x4_t arg2_int32x4_t;
+-
+- out_int32x4_t = vmlsq_s32 (arg0_int32x4_t, arg1_int32x4_t, arg2_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQs8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+- int8x16_t arg2_int8x16_t;
+-
+- out_int8x16_t = vmlsq_s8 (arg0_int8x16_t, arg1_int8x16_t, arg2_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+- uint16x8_t arg2_uint16x8_t;
+-
+- out_uint16x8_t = vmlsq_u16 (arg0_uint16x8_t, arg1_uint16x8_t, arg2_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+- uint32x4_t arg2_uint32x4_t;
+-
+- out_uint32x4_t = vmlsq_u32 (arg0_uint32x4_t, arg1_uint32x4_t, arg2_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsQu8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+- uint8x16_t arg2_uint8x16_t;
+-
+- out_uint8x16_t = vmlsq_u8 (arg0_uint8x16_t, arg1_uint8x16_t, arg2_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmls_lanef32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmls_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmls_lanef32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+- float32x2_t arg2_float32x2_t;
+-
+- out_float32x2_t = vmls_lane_f32 (arg0_float32x2_t, arg1_float32x2_t, arg2_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmls_lanes16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmls_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmls_lanes16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int16x4_t = vmls_lane_s16 (arg0_int16x4_t, arg1_int16x4_t, arg2_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmls_lanes32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmls_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmls_lanes32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int32x2_t = vmls_lane_s32 (arg0_int32x2_t, arg1_int32x2_t, arg2_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmls_laneu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmls_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmls_laneu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint16x4_t = vmls_lane_u16 (arg0_uint16x4_t, arg1_uint16x4_t, arg2_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmls_laneu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmls_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmls_laneu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint32x2_t = vmls_lane_u32 (arg0_uint32x2_t, arg1_uint32x2_t, arg2_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmls_nf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmls_nf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmls_nf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+- float32_t arg2_float32_t;
+-
+- out_float32x2_t = vmls_n_f32 (arg0_float32x2_t, arg1_float32x2_t, arg2_float32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmls_ns16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmls_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmls_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16_t arg2_int16_t;
+-
+- out_int16x4_t = vmls_n_s16 (arg0_int16x4_t, arg1_int16x4_t, arg2_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmls_ns32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmls_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmls_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32_t arg2_int32_t;
+-
+- out_int32x2_t = vmls_n_s32 (arg0_int32x2_t, arg1_int32x2_t, arg2_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmls_nu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmls_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmls_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16_t arg2_uint16_t;
+-
+- out_uint16x4_t = vmls_n_u16 (arg0_uint16x4_t, arg1_uint16x4_t, arg2_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmls_nu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmls_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmls_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32_t arg2_uint32_t;
+-
+- out_uint32x2_t = vmls_n_u32 (arg0_uint32x2_t, arg1_uint32x2_t, arg2_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsf32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+- float32x2_t arg2_float32x2_t;
+-
+- out_float32x2_t = vmls_f32 (arg0_float32x2_t, arg1_float32x2_t, arg2_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsl_lanes16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsl_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsl_lanes16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int32x4_t = vmlsl_lane_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsl_lanes32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsl_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsl_lanes32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int64x2_t = vmlsl_lane_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsl_laneu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsl_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsl_laneu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint32x4_t = vmlsl_lane_u16 (arg0_uint32x4_t, arg1_uint16x4_t, arg2_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsl_laneu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsl_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsl_laneu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint64x2_t = vmlsl_lane_u32 (arg0_uint64x2_t, arg1_uint32x2_t, arg2_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsl_ns16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsl_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsl_ns16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16_t arg2_int16_t;
+-
+- out_int32x4_t = vmlsl_n_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsl_ns32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsl_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsl_ns32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32_t arg2_int32_t;
+-
+- out_int64x2_t = vmlsl_n_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsl_nu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsl_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsl_nu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16_t arg2_uint16_t;
+-
+- out_uint32x4_t = vmlsl_n_u16 (arg0_uint32x4_t, arg1_uint16x4_t, arg2_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsl_nu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsl_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsl_nu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32_t arg2_uint32_t;
+-
+- out_uint64x2_t = vmlsl_n_u32 (arg0_uint64x2_t, arg1_uint32x2_t, arg2_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsls16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsls16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int32x4_t = vmlsl_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsls32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsls32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int64x2_t = vmlsl_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsls8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsls8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int8x8_t arg1_int8x8_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int16x8_t = vmlsl_s8 (arg0_int16x8_t, arg1_int8x8_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.s8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlslu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlslu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlslu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint32x4_t = vmlsl_u16 (arg0_uint32x4_t, arg1_uint16x4_t, arg2_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlslu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlslu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlslu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint64x2_t = vmlsl_u32 (arg0_uint64x2_t, arg1_uint32x2_t, arg2_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlslu8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlslu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlslu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint8x8_t arg1_uint8x8_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint16x8_t = vmlsl_u8 (arg0_uint16x8_t, arg1_uint8x8_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmlsl\.u8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlss16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlss16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlss16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int16x4_t = vmls_s16 (arg0_int16x4_t, arg1_int16x4_t, arg2_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlss32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlss32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlss32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int32x2_t = vmls_s32 (arg0_int32x2_t, arg1_int32x2_t, arg2_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlss8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlss8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlss8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int8x8_t = vmls_s8 (arg0_int8x8_t, arg1_int8x8_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsu16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+- uint16x4_t arg2_uint16x4_t;
+-
+- out_uint16x4_t = vmls_u16 (arg0_uint16x4_t, arg1_uint16x4_t, arg2_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsu32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+- uint32x2_t arg2_uint32x2_t;
+-
+- out_uint32x2_t = vmls_u32 (arg0_uint32x2_t, arg1_uint32x2_t, arg2_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmlsu8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vmlsu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmlsu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint8x8_t = vmls_u8 (arg0_uint8x8_t, arg1_uint8x8_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmls\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_nf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovQ_nf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_nf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32_t arg0_float32_t;
+-
+- out_float32x4_t = vmovq_n_f32 (arg0_float32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_np16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovQ_np16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_np16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16_t arg0_poly16_t;
+-
+- out_poly16x8_t = vmovq_n_p16 (arg0_poly16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_np8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovQ_np8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_np8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8_t arg0_poly8_t;
+-
+- out_poly8x16_t = vmovq_n_p8 (arg0_poly8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16_t arg0_int16_t;
+-
+- out_int16x8_t = vmovq_n_s16 (arg0_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32_t arg0_int32_t;
+-
+- out_int32x4_t = vmovq_n_s32 (arg0_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_ns64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vmovQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_ns64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64_t arg0_int64_t;
+-
+- out_int64x2_t = vmovq_n_s64 (arg0_int64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_ns8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8_t arg0_int8_t;
+-
+- out_int8x16_t = vmovq_n_s8 (arg0_int8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16_t arg0_uint16_t;
+-
+- out_uint16x8_t = vmovq_n_u16 (arg0_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32_t arg0_uint32_t;
+-
+- out_uint32x4_t = vmovq_n_u32 (arg0_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_nu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vmovQ_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_nu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64_t arg0_uint64_t;
+-
+- out_uint64x2_t = vmovq_n_u64 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovQ_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovQ_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovQ_nu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8_t arg0_uint8_t;
+-
+- out_uint8x16_t = vmovq_n_u8 (arg0_uint8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[qQ\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_nf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmov_nf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_nf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32_t arg0_float32_t;
+-
+- out_float32x2_t = vmov_n_f32 (arg0_float32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_np16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmov_np16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_np16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16_t arg0_poly16_t;
+-
+- out_poly16x4_t = vmov_n_p16 (arg0_poly16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_np8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmov_np8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_np8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8_t arg0_poly8_t;
+-
+- out_poly8x8_t = vmov_n_p8 (arg0_poly8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmov_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16_t arg0_int16_t;
+-
+- out_int16x4_t = vmov_n_s16 (arg0_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmov_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32_t arg0_int32_t;
+-
+- out_int32x2_t = vmov_n_s32 (arg0_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_ns64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vmov_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_ns64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64_t arg0_int64_t;
+-
+- out_int64x1_t = vmov_n_s64 (arg0_int64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmov_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_ns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8_t arg0_int8_t;
+-
+- out_int8x8_t = vmov_n_s8 (arg0_int8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmov_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16_t arg0_uint16_t;
+-
+- out_uint16x4_t = vmov_n_u16 (arg0_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.16\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmov_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32_t arg0_uint32_t;
+-
+- out_uint32x2_t = vmov_n_u32 (arg0_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.32\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_nu64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vmov_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_nu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64_t arg0_uint64_t;
+-
+- out_uint64x1_t = vmov_n_u64 (arg0_uint64_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmov_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmov_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmov_nu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8_t arg0_uint8_t;
+-
+- out_uint8x8_t = vmov_n_u8 (arg0_uint8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vdup\.8\[ \]+\[dD\]\[0-9\]+, (\[rR\]\[0-9\]+|\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovls16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovls16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int32x4_t = vmovl_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovl\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovls32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovls32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int64x2_t = vmovl_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovl\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovls8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovls8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int16x8_t = vmovl_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovl\.s8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovlu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovlu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovlu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint32x4_t = vmovl_u16 (arg0_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovl\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovlu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovlu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovlu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint64x2_t = vmovl_u32 (arg0_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovl\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovlu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovlu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovlu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint16x8_t = vmovl_u8 (arg0_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovl\.u8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovns16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int8x8_t = vmovn_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovns32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int16x4_t = vmovn_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovns64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int32x2_t = vmovn_s64 (arg0_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovnu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovnu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovnu16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint8x8_t = vmovn_u16 (arg0_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovnu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovnu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovnu32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint16x4_t = vmovn_u32 (arg0_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmovnu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmovnu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmovnu64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint32x2_t = vmovn_u64 (arg0_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmovn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQ_lanef32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQ_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQ_lanef32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x4_t = vmulq_lane_f32 (arg0_float32x4_t, arg1_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQ_lanes16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQ_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQ_lanes16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x8_t = vmulq_lane_s16 (arg0_int16x8_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQ_lanes32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQ_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQ_lanes32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x4_t = vmulq_lane_s32 (arg0_int32x4_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQ_laneu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQ_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQ_laneu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x8_t = vmulq_lane_u16 (arg0_uint16x8_t, arg1_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQ_laneu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQ_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQ_laneu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x4_t = vmulq_lane_u32 (arg0_uint32x4_t, arg1_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQ_nf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQ_nf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQ_nf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32_t arg1_float32_t;
+-
+- out_float32x4_t = vmulq_n_f32 (arg0_float32x4_t, arg1_float32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQ_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16_t arg1_int16_t;
+-
+- out_int16x8_t = vmulq_n_s16 (arg0_int16x8_t, arg1_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQ_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32_t arg1_int32_t;
+-
+- out_int32x4_t = vmulq_n_s32 (arg0_int32x4_t, arg1_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQ_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16_t arg1_uint16_t;
+-
+- out_uint16x8_t = vmulq_n_u16 (arg0_uint16x8_t, arg1_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQ_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32_t arg1_uint32_t;
+-
+- out_uint32x4_t = vmulq_n_u32 (arg0_uint32x4_t, arg1_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vmulq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQp8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_poly8x16_t = vmulq_p8 (arg0_poly8x16_t, arg1_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.p8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vmulq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vmulq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vmulq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vmulq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vmulq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vmulq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmul_lanef32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmul_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmul_lanef32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vmul_lane_f32 (arg0_float32x2_t, arg1_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmul_lanes16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmul_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmul_lanes16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vmul_lane_s16 (arg0_int16x4_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmul_lanes32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmul_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmul_lanes32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vmul_lane_s32 (arg0_int32x2_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmul_laneu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmul_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmul_laneu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vmul_lane_u16 (arg0_uint16x4_t, arg1_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmul_laneu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmul_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmul_laneu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vmul_lane_u32 (arg0_uint32x2_t, arg1_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmul_nf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmul_nf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmul_nf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32_t arg1_float32_t;
+-
+- out_float32x2_t = vmul_n_f32 (arg0_float32x2_t, arg1_float32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmul_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmul_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmul_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16_t arg1_int16_t;
+-
+- out_int16x4_t = vmul_n_s16 (arg0_int16x4_t, arg1_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmul_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmul_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmul_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32_t arg1_int32_t;
+-
+- out_int32x2_t = vmul_n_s32 (arg0_int32x2_t, arg1_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmul_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmul_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmul_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16_t arg1_uint16_t;
+-
+- out_uint16x4_t = vmul_n_u16 (arg0_uint16x4_t, arg1_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmul_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmul_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmul_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32_t arg1_uint32_t;
+-
+- out_uint32x2_t = vmul_n_u32 (arg0_uint32x2_t, arg1_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vmul_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmull_lanes16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmull_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmull_lanes16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int32x4_t = vmull_lane_s16 (arg0_int16x4_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmull_lanes32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmull_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmull_lanes32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int64x2_t = vmull_lane_s32 (arg0_int32x2_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmull_laneu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmull_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmull_laneu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint32x4_t = vmull_lane_u16 (arg0_uint16x4_t, arg1_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmull_laneu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmull_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmull_laneu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint64x2_t = vmull_lane_u32 (arg0_uint32x2_t, arg1_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmull_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmull_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmull_ns16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16_t arg1_int16_t;
+-
+- out_int32x4_t = vmull_n_s16 (arg0_int16x4_t, arg1_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmull_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmull_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmull_ns32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32_t arg1_int32_t;
+-
+- out_int64x2_t = vmull_n_s32 (arg0_int32x2_t, arg1_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmull_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmull_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmull_nu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16_t arg1_uint16_t;
+-
+- out_uint32x4_t = vmull_n_u16 (arg0_uint16x4_t, arg1_uint16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmull_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmull_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmull_nu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32_t arg1_uint32_t;
+-
+- out_uint64x2_t = vmull_n_u32 (arg0_uint32x2_t, arg1_uint32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmullp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmullp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmullp8 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly16x8_t = vmull_p8 (arg0_poly8x8_t, arg1_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.p8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulls16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulls16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int32x4_t = vmull_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulls32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulls32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int64x2_t = vmull_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulls8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulls8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int16x8_t = vmull_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.s8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmullu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmullu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmullu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint32x4_t = vmull_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmullu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmullu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmullu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint64x2_t = vmull_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmullu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmullu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmullu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint16x8_t = vmull_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmull\.u8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulp8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly8x8_t = vmul_p8 (arg0_poly8x8_t, arg1_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.p8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmuls16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmuls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmuls16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vmul_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmuls32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmuls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmuls32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vmul_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmuls8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmuls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmuls8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vmul_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vmul_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vmul_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmulu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vmulu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmulu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vmul_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmul\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnQp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnQp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnQp8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly8x16_t = vmvnq_p8 (arg0_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnQs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vmvnq_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnQs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vmvnq_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnQs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vmvnq_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnQu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16x8_t = vmvnq_u16 (arg0_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnQu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x4_t = vmvnq_u32 (arg0_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnQu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x16_t = vmvnq_u8 (arg0_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnp8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_poly8x8_t = vmvn_p8 (arg0_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vmvn_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vmvn_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vmvn_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16x4_t = vmvn_u16 (arg0_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x2_t = vmvn_u32 (arg0_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vmvnu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vmvnu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vmvnu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vmvn_u8 (arg0_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vmvn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vnegQf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vnegQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vnegQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x4_t = vnegq_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vneg\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vnegQs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vnegQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vnegQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vnegq_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vneg\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vnegQs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vnegQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vnegQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vnegq_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vneg\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vnegQs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vnegQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vnegQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vnegq_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vneg\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vnegf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vnegf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vnegf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vneg_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vneg\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vnegs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vnegs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vnegs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vneg_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vneg\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vnegs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vnegs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vnegs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vneg_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vneg\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vnegs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vnegs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vnegs8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vneg_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vneg\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int16x8_t out_int16x8_t;
+-int16x8_t arg0_int16x8_t;
+-int16x8_t arg1_int16x8_t;
+-void test_vornQs16 (void)
+-{
+-
+- out_int16x8_t = vornq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int32x4_t out_int32x4_t;
+-int32x4_t arg0_int32x4_t;
+-int32x4_t arg1_int32x4_t;
+-void test_vornQs32 (void)
+-{
+-
+- out_int32x4_t = vornq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int64x2_t out_int64x2_t;
+-int64x2_t arg0_int64x2_t;
+-int64x2_t arg1_int64x2_t;
+-void test_vornQs64 (void)
+-{
+-
+- out_int64x2_t = vornq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int8x16_t out_int8x16_t;
+-int8x16_t arg0_int8x16_t;
+-int8x16_t arg1_int8x16_t;
+-void test_vornQs8 (void)
+-{
+-
+- out_int8x16_t = vornq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint16x8_t out_uint16x8_t;
+-uint16x8_t arg0_uint16x8_t;
+-uint16x8_t arg1_uint16x8_t;
+-void test_vornQu16 (void)
+-{
+-
+- out_uint16x8_t = vornq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint32x4_t out_uint32x4_t;
+-uint32x4_t arg0_uint32x4_t;
+-uint32x4_t arg1_uint32x4_t;
+-void test_vornQu32 (void)
+-{
+-
+- out_uint32x4_t = vornq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint64x2_t out_uint64x2_t;
+-uint64x2_t arg0_uint64x2_t;
+-uint64x2_t arg1_uint64x2_t;
+-void test_vornQu64 (void)
+-{
+-
+- out_uint64x2_t = vornq_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint8x16_t out_uint8x16_t;
+-uint8x16_t arg0_uint8x16_t;
+-uint8x16_t arg1_uint8x16_t;
+-void test_vornQu8 (void)
+-{
+-
+- out_uint8x16_t = vornq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int16x4_t out_int16x4_t;
+-int16x4_t arg0_int16x4_t;
+-int16x4_t arg1_int16x4_t;
+-void test_vorns16 (void)
+-{
+-
+- out_int16x4_t = vorn_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int32x2_t out_int32x2_t;
+-int32x2_t arg0_int32x2_t;
+-int32x2_t arg1_int32x2_t;
+-void test_vorns32 (void)
+-{
+-
+- out_int32x2_t = vorn_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vorns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int64x1_t out_int64x1_t;
+-int64x1_t arg0_int64x1_t;
+-int64x1_t arg1_int64x1_t;
+-void test_vorns64 (void)
+-{
+-
+- out_int64x1_t = vorn_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorns8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-int8x8_t out_int8x8_t;
+-int8x8_t arg0_int8x8_t;
+-int8x8_t arg1_int8x8_t;
+-void test_vorns8 (void)
+-{
+-
+- out_int8x8_t = vorn_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint16x4_t out_uint16x4_t;
+-uint16x4_t arg0_uint16x4_t;
+-uint16x4_t arg1_uint16x4_t;
+-void test_vornu16 (void)
+-{
+-
+- out_uint16x4_t = vorn_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint32x2_t out_uint32x2_t;
+-uint32x2_t arg0_uint32x2_t;
+-uint32x2_t arg1_uint32x2_t;
+-void test_vornu32 (void)
+-{
+-
+- out_uint32x2_t = vorn_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vornu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint64x1_t out_uint64x1_t;
+-uint64x1_t arg0_uint64x1_t;
+-uint64x1_t arg1_uint64x1_t;
+-void test_vornu64 (void)
+-{
+-
+- out_uint64x1_t = vorn_u64 (arg0_uint64x1_t, arg1_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vornu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vornu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O2" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-uint8x8_t out_uint8x8_t;
+-uint8x8_t arg0_uint8x8_t;
+-uint8x8_t arg1_uint8x8_t;
+-void test_vornu8 (void)
+-{
+-
+- out_uint8x8_t = vorn_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorn\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vorrq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vorrq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vorrq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vorrq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vorrq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vorrq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vorrq_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vorrq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vorr_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vorr_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrs64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vorrs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrs64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vorr_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorrs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorrs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorrs8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vorr_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorru16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorru16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorru16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vorr_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorru32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorru32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorru32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vorr_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorru64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vorru64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorru64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vorr_u64 (arg0_uint64x1_t, arg1_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vorru8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vorru8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vorru8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vorr_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vorr\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadalQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadalQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadalQs16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int32x4_t = vpadalq_s16 (arg0_int32x4_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadalQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadalQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadalQs32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int64x2_t = vpadalq_s32 (arg0_int64x2_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadalQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadalQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadalQs8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int16x8_t = vpadalq_s8 (arg0_int16x8_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadalQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadalQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadalQu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint32x4_t = vpadalq_u16 (arg0_uint32x4_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadalQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadalQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadalQu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint64x2_t = vpadalq_u32 (arg0_uint64x2_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadalQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadalQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadalQu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint16x8_t = vpadalq_u8 (arg0_uint16x8_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadals16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadals16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadals16 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int32x2_t = vpadal_s16 (arg0_int32x2_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadals32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadals32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadals32 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int64x1_t = vpadal_s32 (arg0_int64x1_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadals8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadals8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadals8 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int16x4_t = vpadal_s8 (arg0_int16x4_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadalu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadalu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadalu16 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint32x2_t = vpadal_u16 (arg0_uint32x2_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadalu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadalu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadalu32 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint64x1_t = vpadal_u32 (arg0_uint64x1_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadalu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadalu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadalu8 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint16x4_t = vpadal_u8 (arg0_uint16x4_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadal\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpaddf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vpadd_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadd\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddlQs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddlQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddlQs16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int32x4_t = vpaddlq_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddlQs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddlQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddlQs32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int64x2_t = vpaddlq_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddlQs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddlQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddlQs8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int16x8_t = vpaddlq_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddlQu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddlQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddlQu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint32x4_t = vpaddlq_u16 (arg0_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddlQu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddlQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddlQu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint64x2_t = vpaddlq_u32 (arg0_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddlQu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddlQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddlQu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint16x8_t = vpaddlq_u8 (arg0_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddls16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddls16 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int32x2_t = vpaddl_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddls32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddls32 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int64x1_t = vpaddl_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddls8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddls8 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int16x4_t = vpaddl_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddlu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddlu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddlu16 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint32x2_t = vpaddl_u16 (arg0_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddlu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddlu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddlu32 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint64x1_t = vpaddl_u32 (arg0_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddlu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vpaddlu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddlu8 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint16x4_t = vpaddl_u8 (arg0_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpaddl\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadds16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadds16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadds16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vpadd_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadd\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadds32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadds32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadds32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vpadd_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadd\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpadds8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpadds8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpadds8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vpadd_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadd\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpaddu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vpadd_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadd\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpaddu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vpadd_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadd\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpaddu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpaddu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpaddu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vpadd_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpadd\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpmaxf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpmaxf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpmaxf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vpmax_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmax\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpmaxs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpmaxs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpmaxs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vpmax_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmax\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpmaxs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpmaxs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpmaxs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vpmax_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmax\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpmaxs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpmaxs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpmaxs8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vpmax_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmax\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpmaxu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpmaxu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpmaxu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vpmax_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmax\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpmaxu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpmaxu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpmaxu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vpmax_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmax\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpmaxu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpmaxu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpmaxu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vpmax_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmax\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpminf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpminf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpminf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vpmin_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmin\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpmins16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpmins16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpmins16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vpmin_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmin\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpmins32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpmins32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpmins32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vpmin_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmin\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpmins8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpmins8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpmins8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vpmin_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmin\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpminu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpminu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpminu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vpmin_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmin\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpminu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpminu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpminu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vpmin_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmin\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vpminu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vpminu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vpminu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vpmin_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vpmin\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulhQ_lanes16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulhQ_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulhQ_lanes16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x8_t = vqrdmulhq_lane_s16 (arg0_int16x8_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulhQ_lanes32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulhQ_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulhQ_lanes32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x4_t = vqrdmulhq_lane_s32 (arg0_int32x4_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulhQ_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulhQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulhQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16_t arg1_int16_t;
+-
+- out_int16x8_t = vqrdmulhq_n_s16 (arg0_int16x8_t, arg1_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulhQ_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulhQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulhQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32_t arg1_int32_t;
+-
+- out_int32x4_t = vqrdmulhq_n_s32 (arg0_int32x4_t, arg1_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulhQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulhQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulhQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vqrdmulhq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulhQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulhQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulhQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vqrdmulhq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulh_lanes16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulh_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulh_lanes16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vqrdmulh_lane_s16 (arg0_int16x4_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulh_lanes32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulh_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulh_lanes32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vqrdmulh_lane_s32 (arg0_int32x2_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulh_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulh_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulh_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16_t arg1_int16_t;
+-
+- out_int16x4_t = vqrdmulh_n_s16 (arg0_int16x4_t, arg1_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulh_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulh_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulh_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32_t arg1_int32_t;
+-
+- out_int32x2_t = vqrdmulh_n_s32 (arg0_int32x2_t, arg1_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulhs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulhs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulhs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vqrdmulh_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRdmulhs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRdmulhs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRdmulhs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vqrdmulh_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrdmulh\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vqrshlq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vqrshlq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vqrshlq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vqrshlq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_uint16x8_t = vqrshlq_u16 (arg0_uint16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_uint32x4_t = vqrshlq_u32 (arg0_uint32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_uint64x2_t = vqrshlq_u64 (arg0_uint64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_uint8x16_t = vqrshlq_u8 (arg0_uint8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshls16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshls16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vqrshl_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshls32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshls32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vqrshl_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshls64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshls64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshls64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vqrshl_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshls8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshls8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vqrshl_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_uint16x4_t = vqrshl_u16 (arg0_uint16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_uint32x2_t = vqrshl_u32 (arg0_uint32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_uint64x1_t = vqrshl_u64 (arg0_uint64x1_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshlu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqRshlu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshlu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_uint8x8_t = vqrshl_u8 (arg0_uint8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshl\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshrn_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqRshrn_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshrn_ns16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int8x8_t = vqrshrn_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshrn\.s16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshrn_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqRshrn_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshrn_ns32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int16x4_t = vqrshrn_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshrn\.s32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshrn_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqRshrn_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshrn_ns64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int32x2_t = vqrshrn_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshrn\.s64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshrn_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqRshrn_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshrn_nu16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint8x8_t = vqrshrn_n_u16 (arg0_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshrn\.u16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshrn_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqRshrn_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshrn_nu32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint16x4_t = vqrshrn_n_u32 (arg0_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshrn\.u32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshrn_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqRshrn_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshrn_nu64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint32x2_t = vqrshrn_n_u64 (arg0_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshrn\.u64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshrun_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqRshrun_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshrun_ns16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_uint8x8_t = vqrshrun_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshrun\.s16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshrun_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqRshrun_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshrun_ns32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_uint16x4_t = vqrshrun_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshrun\.s32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqRshrun_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqRshrun_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqRshrun_ns64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_uint32x2_t = vqrshrun_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqrshrun\.s64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqabsQs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqabsQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqabsQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vqabsq_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqabs\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqabsQs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqabsQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqabsQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vqabsq_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqabs\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqabsQs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqabsQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqabsQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vqabsq_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqabs\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqabss16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqabss16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqabss16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vqabs_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqabs\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqabss32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqabss32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqabss32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vqabs_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqabs\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqabss8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqabss8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqabss8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vqabs_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqabs\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vqaddq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vqaddq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vqaddq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vqaddq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vqaddq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vqaddq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vqaddq_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vqaddq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqadds16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqadds16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqadds16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vqadd_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqadds32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqadds32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqadds32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vqadd_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqadds64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqadds64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqadds64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vqadd_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqadds8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqadds8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqadds8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vqadd_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vqadd_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vqadd_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vqadd_u64 (arg0_uint64x1_t, arg1_uint64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqaddu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqaddu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqaddu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vqadd_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqadd\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlal_lanes16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlal_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlal_lanes16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int32x4_t = vqdmlal_lane_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlal\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlal_lanes32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlal_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlal_lanes32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int64x2_t = vqdmlal_lane_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlal\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlal_ns16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlal_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlal_ns16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16_t arg2_int16_t;
+-
+- out_int32x4_t = vqdmlal_n_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlal\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlal_ns32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlal_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlal_ns32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32_t arg2_int32_t;
+-
+- out_int64x2_t = vqdmlal_n_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlal\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlals16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlals16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlals16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int32x4_t = vqdmlal_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlal\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlals32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlals32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlals32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int64x2_t = vqdmlal_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlal\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlsl_lanes16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlsl_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlsl_lanes16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int32x4_t = vqdmlsl_lane_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlsl\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlsl_lanes32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlsl_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlsl_lanes32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int64x2_t = vqdmlsl_lane_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlsl\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlsl_ns16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlsl_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlsl_ns16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16_t arg2_int16_t;
+-
+- out_int32x4_t = vqdmlsl_n_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlsl\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlsl_ns32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlsl_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlsl_ns32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32_t arg2_int32_t;
+-
+- out_int64x2_t = vqdmlsl_n_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlsl\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlsls16.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlsls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlsls16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+- int16x4_t arg2_int16x4_t;
+-
+- out_int32x4_t = vqdmlsl_s16 (arg0_int32x4_t, arg1_int16x4_t, arg2_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlsl\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmlsls32.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vqdmlsls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmlsls32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+- int32x2_t arg2_int32x2_t;
+-
+- out_int64x2_t = vqdmlsl_s32 (arg0_int64x2_t, arg1_int32x2_t, arg2_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmlsl\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulhQ_lanes16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulhQ_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulhQ_lanes16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x8_t = vqdmulhq_lane_s16 (arg0_int16x8_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulhQ_lanes32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulhQ_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulhQ_lanes32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x4_t = vqdmulhq_lane_s32 (arg0_int32x4_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulhQ_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulhQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulhQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16_t arg1_int16_t;
+-
+- out_int16x8_t = vqdmulhq_n_s16 (arg0_int16x8_t, arg1_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulhQ_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulhQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulhQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32_t arg1_int32_t;
+-
+- out_int32x4_t = vqdmulhq_n_s32 (arg0_int32x4_t, arg1_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulhQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulhQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulhQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vqdmulhq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulhQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulhQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulhQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vqdmulhq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulh_lanes16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulh_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulh_lanes16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vqdmulh_lane_s16 (arg0_int16x4_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulh_lanes32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulh_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulh_lanes32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vqdmulh_lane_s32 (arg0_int32x2_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulh_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulh_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulh_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16_t arg1_int16_t;
+-
+- out_int16x4_t = vqdmulh_n_s16 (arg0_int16x4_t, arg1_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulh_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulh_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulh_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32_t arg1_int32_t;
+-
+- out_int32x2_t = vqdmulh_n_s32 (arg0_int32x2_t, arg1_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulhs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulhs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulhs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vqdmulh_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulhs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulhs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulhs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vqdmulh_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmulh\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmull_lanes16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmull_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmull_lanes16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int32x4_t = vqdmull_lane_s16 (arg0_int16x4_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmull\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmull_lanes32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmull_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmull_lanes32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int64x2_t = vqdmull_lane_s32 (arg0_int32x2_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmull\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmull_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmull_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmull_ns16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16_t arg1_int16_t;
+-
+- out_int32x4_t = vqdmull_n_s16 (arg0_int16x4_t, arg1_int16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmull\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmull_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmull_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmull_ns32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32_t arg1_int32_t;
+-
+- out_int64x2_t = vqdmull_n_s32 (arg0_int32x2_t, arg1_int32_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmull\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulls16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulls16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int32x4_t = vqdmull_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmull\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqdmulls32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqdmulls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqdmulls32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int64x2_t = vqdmull_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqdmull\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqmovns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqmovns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqmovns16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int8x8_t = vqmovn_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqmovn\.s16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqmovns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqmovns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqmovns32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int16x4_t = vqmovn_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqmovn\.s32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqmovns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqmovns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqmovns64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int32x2_t = vqmovn_s64 (arg0_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqmovn\.s64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqmovnu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqmovnu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqmovnu16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint8x8_t = vqmovn_u16 (arg0_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqmovn\.u16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqmovnu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqmovnu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqmovnu32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint16x4_t = vqmovn_u32 (arg0_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqmovn\.u32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqmovnu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqmovnu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqmovnu64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint32x2_t = vqmovn_u64 (arg0_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqmovn\.u64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqmovuns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqmovuns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqmovuns16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_uint8x8_t = vqmovun_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqmovun\.s16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqmovuns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqmovuns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqmovuns32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_uint16x4_t = vqmovun_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqmovun\.s32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqmovuns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqmovuns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqmovuns64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_uint32x2_t = vqmovun_s64 (arg0_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqmovun\.s64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqnegQs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqnegQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqnegQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vqnegq_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqneg\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqnegQs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqnegQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqnegQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vqnegq_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqneg\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqnegQs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqnegQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqnegQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vqnegq_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqneg\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqnegs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqnegs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqnegs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vqneg_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqneg\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqnegs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqnegs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqnegs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vqneg_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqneg\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqnegs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqnegs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqnegs8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vqneg_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqneg\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQ_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vqshlq_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQ_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vqshlq_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQ_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQ_ns64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int64x2_t = vqshlq_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQ_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQ_ns8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vqshlq_n_s8 (arg0_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQ_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16x8_t = vqshlq_n_u16 (arg0_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQ_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x4_t = vqshlq_n_u32 (arg0_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQ_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlQ_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQ_nu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint64x2_t = vqshlq_n_u64 (arg0_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQ_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlQ_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQ_nu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x16_t = vqshlq_n_u8 (arg0_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vqshlq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vqshlq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vqshlq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vqshlq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_uint16x8_t = vqshlq_u16 (arg0_uint16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_uint32x4_t = vqshlq_u32 (arg0_uint32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_uint64x2_t = vqshlq_u64 (arg0_uint64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_uint8x16_t = vqshlq_u8 (arg0_uint8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshl_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshl_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshl_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vqshl_n_s16 (arg0_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshl_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshl_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshl_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vqshl_n_s32 (arg0_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshl_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshl_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshl_ns64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_int64x1_t = vqshl_n_s64 (arg0_int64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshl_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshl_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshl_ns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vqshl_n_s8 (arg0_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshl_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshl_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshl_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16x4_t = vqshl_n_u16 (arg0_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshl_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshl_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshl_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x2_t = vqshl_n_u32 (arg0_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshl_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshl_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshl_nu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_uint64x1_t = vqshl_n_u64 (arg0_uint64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshl_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshl_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshl_nu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vqshl_n_u8 (arg0_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshls16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshls16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vqshl_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshls32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshls32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vqshl_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshls64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshls64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshls64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vqshl_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshls8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshls8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vqshl_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_uint16x4_t = vqshl_u16 (arg0_uint16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_uint32x2_t = vqshl_u32 (arg0_uint32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_uint64x1_t = vqshl_u64 (arg0_uint64x1_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqshlu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_uint8x8_t = vqshl_u8 (arg0_uint8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqshl\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshluQ_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshluQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshluQ_ns16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_uint16x8_t = vqshluq_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshlu\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshluQ_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshluQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshluQ_ns32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_uint32x4_t = vqshluq_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshlu\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshluQ_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshluQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshluQ_ns64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_uint64x2_t = vqshluq_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshlu\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshluQ_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshluQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshluQ_ns8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_uint8x16_t = vqshluq_n_s8 (arg0_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshlu\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlu_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlu_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlu_ns16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_uint16x4_t = vqshlu_n_s16 (arg0_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshlu\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlu_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlu_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlu_ns32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_uint32x2_t = vqshlu_n_s32 (arg0_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshlu\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlu_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlu_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlu_ns64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_uint64x1_t = vqshlu_n_s64 (arg0_int64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshlu\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshlu_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshlu_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshlu_ns8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_uint8x8_t = vqshlu_n_s8 (arg0_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshlu\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshrn_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshrn_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshrn_ns16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int8x8_t = vqshrn_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshrn\.s16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshrn_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshrn_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshrn_ns32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int16x4_t = vqshrn_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshrn\.s32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshrn_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshrn_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshrn_ns64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int32x2_t = vqshrn_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshrn\.s64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshrn_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshrn_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshrn_nu16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint8x8_t = vqshrn_n_u16 (arg0_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshrn\.u16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshrn_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshrn_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshrn_nu32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint16x4_t = vqshrn_n_u32 (arg0_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshrn\.u32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshrn_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshrn_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshrn_nu64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint32x2_t = vqshrn_n_u64 (arg0_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshrn\.u64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshrun_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshrun_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshrun_ns16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_uint8x8_t = vqshrun_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshrun\.s16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshrun_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshrun_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshrun_ns32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_uint16x4_t = vqshrun_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshrun\.s32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqshrun_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vqshrun_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqshrun_ns64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_uint32x2_t = vqshrun_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vqshrun\.s64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vqsubq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vqsubq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vqsubq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vqsubq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vqsubq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vqsubq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vqsubq_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vqsubq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vqsub_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vqsub_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubs64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vqsub_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubs8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vqsub_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vqsub_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vqsub_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vqsub_u64 (arg0_uint64x1_t, arg1_uint64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vqsubu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vqsubu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vqsubu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vqsub_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vqsub\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrecpeQf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrecpeQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrecpeQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x4_t = vrecpeq_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrecpe\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrecpeQu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrecpeQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrecpeQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x4_t = vrecpeq_u32 (arg0_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrecpe\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrecpef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrecpef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrecpef32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vrecpe_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrecpe\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrecpeu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrecpeu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrecpeu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x2_t = vrecpe_u32 (arg0_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrecpe\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrecpsQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vrecpsQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrecpsQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vrecpsq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrecps\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrecpsf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vrecpsf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrecpsf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vrecps_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrecps\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_p128 (void)
+-{
+- float32x4_t out_float32x4_t;
+- poly128_t arg0_poly128_t;
+-
+- out_float32x4_t = vreinterpretq_f32_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_p16 (void)
+-{
+- float32x4_t out_float32x4_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_float32x4_t = vreinterpretq_f32_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_p64 (void)
+-{
+- float32x4_t out_float32x4_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_float32x4_t = vreinterpretq_f32_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_p8 (void)
+-{
+- float32x4_t out_float32x4_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_float32x4_t = vreinterpretq_f32_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_s16 (void)
+-{
+- float32x4_t out_float32x4_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_float32x4_t = vreinterpretq_f32_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_s32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_float32x4_t = vreinterpretq_f32_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_s64 (void)
+-{
+- float32x4_t out_float32x4_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_float32x4_t = vreinterpretq_f32_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_s8 (void)
+-{
+- float32x4_t out_float32x4_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_float32x4_t = vreinterpretq_f32_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_u16 (void)
+-{
+- float32x4_t out_float32x4_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_float32x4_t = vreinterpretq_f32_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_u32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_float32x4_t = vreinterpretq_f32_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_u64 (void)
+-{
+- float32x4_t out_float32x4_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_float32x4_t = vreinterpretq_f32_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQf32_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQf32_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQf32_u8 (void)
+-{
+- float32x4_t out_float32x4_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_float32x4_t = vreinterpretq_f32_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_f32 (void)
+-{
+- poly128_t out_poly128_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_poly128_t = vreinterpretq_p128_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_p16 (void)
+-{
+- poly128_t out_poly128_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_poly128_t = vreinterpretq_p128_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_p64 (void)
+-{
+- poly128_t out_poly128_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_poly128_t = vreinterpretq_p128_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_p8 (void)
+-{
+- poly128_t out_poly128_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly128_t = vreinterpretq_p128_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_s16 (void)
+-{
+- poly128_t out_poly128_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_poly128_t = vreinterpretq_p128_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_s32 (void)
+-{
+- poly128_t out_poly128_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_poly128_t = vreinterpretq_p128_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_s64 (void)
+-{
+- poly128_t out_poly128_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_poly128_t = vreinterpretq_p128_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_s8 (void)
+-{
+- poly128_t out_poly128_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_poly128_t = vreinterpretq_p128_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_u16 (void)
+-{
+- poly128_t out_poly128_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_poly128_t = vreinterpretq_p128_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_u32 (void)
+-{
+- poly128_t out_poly128_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_poly128_t = vreinterpretq_p128_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_u64 (void)
+-{
+- poly128_t out_poly128_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_poly128_t = vreinterpretq_p128_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp128_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp128_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp128_u8 (void)
+-{
+- poly128_t out_poly128_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_poly128_t = vreinterpretq_p128_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_f32 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_p128 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly128_t arg0_poly128_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_p64 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_p8 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_s16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_s32 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_s64 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_s8 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_u16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_u32 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_u64 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp16_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp16_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp16_u8 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_poly16x8_t = vreinterpretq_p16_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_f32 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_p128 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- poly128_t arg0_poly128_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_p16 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_p8 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_s16 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_s32 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_s64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_s8 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_u16 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_u32 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_u64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp64_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp64_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp64_u8 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_poly64x2_t = vreinterpretq_p64_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_f32 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_p128 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly128_t arg0_poly128_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_p16 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_p64 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_s16 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_s32 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_s64 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_s8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_u16 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_u32 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_u64 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQp8_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQp8_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQp8_u8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_poly8x16_t = vreinterpretq_p8_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_f32 (void)
+-{
+- int16x8_t out_int16x8_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_int16x8_t = vreinterpretq_s16_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_p128 (void)
+-{
+- int16x8_t out_int16x8_t;
+- poly128_t arg0_poly128_t;
+-
+- out_int16x8_t = vreinterpretq_s16_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_p16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_int16x8_t = vreinterpretq_s16_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_p64 (void)
+-{
+- int16x8_t out_int16x8_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_int16x8_t = vreinterpretq_s16_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_p8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_int16x8_t = vreinterpretq_s16_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_s32 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int16x8_t = vreinterpretq_s16_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_s64 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int16x8_t = vreinterpretq_s16_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_s8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int16x8_t = vreinterpretq_s16_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_u16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_int16x8_t = vreinterpretq_s16_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_u32 (void)
+-{
+- int16x8_t out_int16x8_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_int16x8_t = vreinterpretq_s16_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_u64 (void)
+-{
+- int16x8_t out_int16x8_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_int16x8_t = vreinterpretq_s16_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs16_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs16_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs16_u8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_int16x8_t = vreinterpretq_s16_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_f32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_int32x4_t = vreinterpretq_s32_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_p128 (void)
+-{
+- int32x4_t out_int32x4_t;
+- poly128_t arg0_poly128_t;
+-
+- out_int32x4_t = vreinterpretq_s32_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_p16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_int32x4_t = vreinterpretq_s32_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_p64 (void)
+-{
+- int32x4_t out_int32x4_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_int32x4_t = vreinterpretq_s32_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_p8 (void)
+-{
+- int32x4_t out_int32x4_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_int32x4_t = vreinterpretq_s32_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_s16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int32x4_t = vreinterpretq_s32_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_s64 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int32x4_t = vreinterpretq_s32_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_s8 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int32x4_t = vreinterpretq_s32_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_u16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_int32x4_t = vreinterpretq_s32_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_u32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_int32x4_t = vreinterpretq_s32_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_u64 (void)
+-{
+- int32x4_t out_int32x4_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_int32x4_t = vreinterpretq_s32_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs32_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs32_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs32_u8 (void)
+-{
+- int32x4_t out_int32x4_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_int32x4_t = vreinterpretq_s32_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_f32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_int64x2_t = vreinterpretq_s64_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_p128 (void)
+-{
+- int64x2_t out_int64x2_t;
+- poly128_t arg0_poly128_t;
+-
+- out_int64x2_t = vreinterpretq_s64_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_p16 (void)
+-{
+- int64x2_t out_int64x2_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_int64x2_t = vreinterpretq_s64_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_p64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_int64x2_t = vreinterpretq_s64_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_p8 (void)
+-{
+- int64x2_t out_int64x2_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_int64x2_t = vreinterpretq_s64_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_s16 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int64x2_t = vreinterpretq_s64_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_s32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int64x2_t = vreinterpretq_s64_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_s8 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int64x2_t = vreinterpretq_s64_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_u16 (void)
+-{
+- int64x2_t out_int64x2_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_int64x2_t = vreinterpretq_s64_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_u32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_int64x2_t = vreinterpretq_s64_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_u64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_int64x2_t = vreinterpretq_s64_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs64_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs64_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs64_u8 (void)
+-{
+- int64x2_t out_int64x2_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_int64x2_t = vreinterpretq_s64_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_f32 (void)
+-{
+- int8x16_t out_int8x16_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_int8x16_t = vreinterpretq_s8_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_p128 (void)
+-{
+- int8x16_t out_int8x16_t;
+- poly128_t arg0_poly128_t;
+-
+- out_int8x16_t = vreinterpretq_s8_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_p16 (void)
+-{
+- int8x16_t out_int8x16_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_int8x16_t = vreinterpretq_s8_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_p64 (void)
+-{
+- int8x16_t out_int8x16_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_int8x16_t = vreinterpretq_s8_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_p8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_int8x16_t = vreinterpretq_s8_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_s16 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int8x16_t = vreinterpretq_s8_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_s32 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int8x16_t = vreinterpretq_s8_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_s64 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int8x16_t = vreinterpretq_s8_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_u16 (void)
+-{
+- int8x16_t out_int8x16_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_int8x16_t = vreinterpretq_s8_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_u32 (void)
+-{
+- int8x16_t out_int8x16_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_int8x16_t = vreinterpretq_s8_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_u64 (void)
+-{
+- int8x16_t out_int8x16_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_int8x16_t = vreinterpretq_s8_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQs8_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQs8_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQs8_u8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_int8x16_t = vreinterpretq_s8_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_f32 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_p128 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- poly128_t arg0_poly128_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_p16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_p64 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_p8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_s16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_s32 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_s64 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_s8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_u32 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_u64 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu16_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu16_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu16_u8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint16x8_t = vreinterpretq_u16_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_f32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_p128 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- poly128_t arg0_poly128_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_p16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_p64 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_p8 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_s16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_s32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_s64 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_s8 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_u16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_u64 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu32_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu32_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu32_u8 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint32x4_t = vreinterpretq_u32_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_f32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_p128 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- poly128_t arg0_poly128_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_p16 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_p64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_p8 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_s16 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_s32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_s64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_s8 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_u16 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_u32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu64_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu64_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu64_u8 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint64x2_t = vreinterpretq_u64_u8 (arg0_uint8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_f32 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_f32 (arg0_float32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_p128.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_p128' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_p128 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- poly128_t arg0_poly128_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_p128 (arg0_poly128_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_p16 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_p16 (arg0_poly16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_p64 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- poly64x2_t arg0_poly64x2_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_p64 (arg0_poly64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_p8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_p8 (arg0_poly8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_s16 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_s16 (arg0_int16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_s32 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_s32 (arg0_int32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_s64 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_s64 (arg0_int64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_s8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_s8 (arg0_int8x16_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_u16 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_u16 (arg0_uint16x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_u32 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_u32 (arg0_uint32x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretQu8_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretQu8_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretQu8_u64 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint8x16_t = vreinterpretq_u8_u64 (arg0_uint64x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_p16 (void)
+-{
+- float32x2_t out_float32x2_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_float32x2_t = vreinterpret_f32_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_p64 (void)
+-{
+- float32x2_t out_float32x2_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_float32x2_t = vreinterpret_f32_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_p8 (void)
+-{
+- float32x2_t out_float32x2_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_float32x2_t = vreinterpret_f32_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_s16 (void)
+-{
+- float32x2_t out_float32x2_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_float32x2_t = vreinterpret_f32_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_s32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_float32x2_t = vreinterpret_f32_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_s64 (void)
+-{
+- float32x2_t out_float32x2_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_float32x2_t = vreinterpret_f32_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_s8 (void)
+-{
+- float32x2_t out_float32x2_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_float32x2_t = vreinterpret_f32_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_u16 (void)
+-{
+- float32x2_t out_float32x2_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_float32x2_t = vreinterpret_f32_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_u32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_float32x2_t = vreinterpret_f32_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_u64 (void)
+-{
+- float32x2_t out_float32x2_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_float32x2_t = vreinterpret_f32_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretf32_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretf32_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretf32_u8 (void)
+-{
+- float32x2_t out_float32x2_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_float32x2_t = vreinterpret_f32_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_f32 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_poly16x4_t = vreinterpret_p16_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_p64 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_poly16x4_t = vreinterpret_p16_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_p8 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_poly16x4_t = vreinterpret_p16_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_s16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_poly16x4_t = vreinterpret_p16_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_s32 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_poly16x4_t = vreinterpret_p16_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_s64 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_poly16x4_t = vreinterpret_p16_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_s8 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_poly16x4_t = vreinterpret_p16_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_u16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_poly16x4_t = vreinterpret_p16_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_u32 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_poly16x4_t = vreinterpret_p16_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_u64 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_poly16x4_t = vreinterpret_p16_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp16_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp16_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp16_u8 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_poly16x4_t = vreinterpret_p16_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_f32 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_poly64x1_t = vreinterpret_p64_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_p16 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_poly64x1_t = vreinterpret_p64_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_p8 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_poly64x1_t = vreinterpret_p64_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_s16 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_poly64x1_t = vreinterpret_p64_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_s32 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_poly64x1_t = vreinterpret_p64_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_s64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_poly64x1_t = vreinterpret_p64_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_s8 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_poly64x1_t = vreinterpret_p64_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_u16 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_poly64x1_t = vreinterpret_p64_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_u32 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_poly64x1_t = vreinterpret_p64_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_u64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_poly64x1_t = vreinterpret_p64_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp64_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp64_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp64_u8 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_poly64x1_t = vreinterpret_p64_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_f32 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_poly8x8_t = vreinterpret_p8_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_p16 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_poly8x8_t = vreinterpret_p8_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_p64 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_poly8x8_t = vreinterpret_p8_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_s16 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_poly8x8_t = vreinterpret_p8_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_s32 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_poly8x8_t = vreinterpret_p8_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_s64 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_poly8x8_t = vreinterpret_p8_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_s8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_poly8x8_t = vreinterpret_p8_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_u16 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_poly8x8_t = vreinterpret_p8_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_u32 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_poly8x8_t = vreinterpret_p8_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_u64 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_poly8x8_t = vreinterpret_p8_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretp8_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretp8_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretp8_u8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_poly8x8_t = vreinterpret_p8_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_f32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_int16x4_t = vreinterpret_s16_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_p16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_int16x4_t = vreinterpret_s16_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_p64 (void)
+-{
+- int16x4_t out_int16x4_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_int16x4_t = vreinterpret_s16_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_p8 (void)
+-{
+- int16x4_t out_int16x4_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_int16x4_t = vreinterpret_s16_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_s32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int16x4_t = vreinterpret_s16_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_s64 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_int16x4_t = vreinterpret_s16_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_s8 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int16x4_t = vreinterpret_s16_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_u16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_int16x4_t = vreinterpret_s16_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_u32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_int16x4_t = vreinterpret_s16_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_u64 (void)
+-{
+- int16x4_t out_int16x4_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_int16x4_t = vreinterpret_s16_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets16_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets16_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets16_u8 (void)
+-{
+- int16x4_t out_int16x4_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_int16x4_t = vreinterpret_s16_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_f32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_int32x2_t = vreinterpret_s32_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_p16 (void)
+-{
+- int32x2_t out_int32x2_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_int32x2_t = vreinterpret_s32_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_p64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_int32x2_t = vreinterpret_s32_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_p8 (void)
+-{
+- int32x2_t out_int32x2_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_int32x2_t = vreinterpret_s32_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_s16 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int32x2_t = vreinterpret_s32_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_s64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_int32x2_t = vreinterpret_s32_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_s8 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int32x2_t = vreinterpret_s32_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_u16 (void)
+-{
+- int32x2_t out_int32x2_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_int32x2_t = vreinterpret_s32_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_u32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_int32x2_t = vreinterpret_s32_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_u64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_int32x2_t = vreinterpret_s32_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets32_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets32_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets32_u8 (void)
+-{
+- int32x2_t out_int32x2_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_int32x2_t = vreinterpret_s32_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_f32 (void)
+-{
+- int64x1_t out_int64x1_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_int64x1_t = vreinterpret_s64_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_p16 (void)
+-{
+- int64x1_t out_int64x1_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_int64x1_t = vreinterpret_s64_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_p64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_int64x1_t = vreinterpret_s64_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_p8 (void)
+-{
+- int64x1_t out_int64x1_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_int64x1_t = vreinterpret_s64_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_s16 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int64x1_t = vreinterpret_s64_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_s32 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int64x1_t = vreinterpret_s64_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_s8 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int64x1_t = vreinterpret_s64_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_u16 (void)
+-{
+- int64x1_t out_int64x1_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_int64x1_t = vreinterpret_s64_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_u32 (void)
+-{
+- int64x1_t out_int64x1_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_int64x1_t = vreinterpret_s64_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_u64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_int64x1_t = vreinterpret_s64_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets64_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets64_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets64_u8 (void)
+-{
+- int64x1_t out_int64x1_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_int64x1_t = vreinterpret_s64_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_f32 (void)
+-{
+- int8x8_t out_int8x8_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_int8x8_t = vreinterpret_s8_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_p16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_int8x8_t = vreinterpret_s8_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_p64 (void)
+-{
+- int8x8_t out_int8x8_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_int8x8_t = vreinterpret_s8_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_p8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_int8x8_t = vreinterpret_s8_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_s16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int8x8_t = vreinterpret_s8_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_s32 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int8x8_t = vreinterpret_s8_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_s64 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_int8x8_t = vreinterpret_s8_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_u16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_int8x8_t = vreinterpret_s8_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_u32 (void)
+-{
+- int8x8_t out_int8x8_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_int8x8_t = vreinterpret_s8_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_u64 (void)
+-{
+- int8x8_t out_int8x8_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_int8x8_t = vreinterpret_s8_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterprets8_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterprets8_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterprets8_u8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_int8x8_t = vreinterpret_s8_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_f32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_uint16x4_t = vreinterpret_u16_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_p16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_uint16x4_t = vreinterpret_u16_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_p64 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_uint16x4_t = vreinterpret_u16_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_p8 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_uint16x4_t = vreinterpret_u16_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_s16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_uint16x4_t = vreinterpret_u16_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_s32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_uint16x4_t = vreinterpret_u16_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_s64 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_uint16x4_t = vreinterpret_u16_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_s8 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_uint16x4_t = vreinterpret_u16_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_u32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint16x4_t = vreinterpret_u16_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_u64 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_uint16x4_t = vreinterpret_u16_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu16_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu16_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu16_u8 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint16x4_t = vreinterpret_u16_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_f32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_uint32x2_t = vreinterpret_u32_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_p16 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_uint32x2_t = vreinterpret_u32_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_p64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_uint32x2_t = vreinterpret_u32_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_p8 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_uint32x2_t = vreinterpret_u32_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_s16 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_uint32x2_t = vreinterpret_u32_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_s32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_uint32x2_t = vreinterpret_u32_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_s64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_uint32x2_t = vreinterpret_u32_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_s8 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_uint32x2_t = vreinterpret_u32_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_u16 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint32x2_t = vreinterpret_u32_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_u64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_uint32x2_t = vreinterpret_u32_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu32_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu32_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu32_u8 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint32x2_t = vreinterpret_u32_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_f32 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_uint64x1_t = vreinterpret_u64_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_p16 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_uint64x1_t = vreinterpret_u64_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_p64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_uint64x1_t = vreinterpret_u64_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_p8 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_uint64x1_t = vreinterpret_u64_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_s16 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_uint64x1_t = vreinterpret_u64_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_s32 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_uint64x1_t = vreinterpret_u64_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_s64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_uint64x1_t = vreinterpret_u64_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_s8 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_uint64x1_t = vreinterpret_u64_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_u16 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint64x1_t = vreinterpret_u64_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_u32 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint64x1_t = vreinterpret_u64_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu64_u8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu64_u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu64_u8 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint64x1_t = vreinterpret_u64_u8 (arg0_uint8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_f32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_f32 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_uint8x8_t = vreinterpret_u8_f32 (arg0_float32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_p16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_p16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_uint8x8_t = vreinterpret_u8_p16 (arg0_poly16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_p64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_p64 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- poly64x1_t arg0_poly64x1_t;
+-
+- out_uint8x8_t = vreinterpret_u8_p64 (arg0_poly64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_p8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_p8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_uint8x8_t = vreinterpret_u8_p8 (arg0_poly8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_s16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_s16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_uint8x8_t = vreinterpret_u8_s16 (arg0_int16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_s32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_s32 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_uint8x8_t = vreinterpret_u8_s32 (arg0_int32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_s64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_s64 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_uint8x8_t = vreinterpret_u8_s64 (arg0_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_s8.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_s8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_uint8x8_t = vreinterpret_u8_s8 (arg0_int8x8_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_u16.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_u16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint8x8_t = vreinterpret_u8_u16 (arg0_uint16x4_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_u32.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_u32 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint8x8_t = vreinterpret_u8_u32 (arg0_uint32x2_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vreinterpretu8_u64.c
++++ b/src//dev/null
+@@ -1,18 +0,0 @@
+-/* Test the `vreinterpretu8_u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vreinterpretu8_u64 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_uint8x8_t = vreinterpret_u8_u64 (arg0_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev16Qp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev16Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev16Qp8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly8x16_t = vrev16q_p8 (arg0_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev16\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev16Qs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev16Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev16Qs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vrev16q_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev16\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev16Qu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev16Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev16Qu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x16_t = vrev16q_u8 (arg0_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev16\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev16p8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev16p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev16p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_poly8x8_t = vrev16_p8 (arg0_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev16\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev16s8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev16s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev16s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vrev16_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev16\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev16u8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev16u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev16u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vrev16_u8 (arg0_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev16\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32Qp16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32Qp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32Qp16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_poly16x8_t = vrev32q_p16 (arg0_poly16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32Qp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32Qp8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly8x16_t = vrev32q_p8 (arg0_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32Qs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32Qs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32Qs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vrev32q_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32Qs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32Qs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vrev32q_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32Qu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32Qu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32Qu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16x8_t = vrev32q_u16 (arg0_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32Qu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32Qu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x16_t = vrev32q_u8 (arg0_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32p16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32p16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_poly16x4_t = vrev32_p16 (arg0_poly16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32p8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_poly8x8_t = vrev32_p8 (arg0_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32s16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32s16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vrev32_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32s8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vrev32_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32u16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32u16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16x4_t = vrev32_u16 (arg0_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev32u8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev32u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev32u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vrev32_u8 (arg0_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev32\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64Qf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64Qf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64Qf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x4_t = vrev64q_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64Qp16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64Qp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64Qp16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16x8_t arg0_poly16x8_t;
+-
+- out_poly16x8_t = vrev64q_p16 (arg0_poly16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64Qp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64Qp8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+-
+- out_poly8x16_t = vrev64q_p8 (arg0_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64Qs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64Qs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64Qs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vrev64q_s16 (arg0_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64Qs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64Qs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64Qs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vrev64q_s32 (arg0_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64Qs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64Qs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vrev64q_s8 (arg0_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64Qu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64Qu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64Qu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16x8_t = vrev64q_u16 (arg0_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64Qu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64Qu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64Qu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x4_t = vrev64q_u32 (arg0_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64Qu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64Qu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x16_t = vrev64q_u8 (arg0_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64f32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vrev64_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64p16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64p16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16x4_t arg0_poly16x4_t;
+-
+- out_poly16x4_t = vrev64_p16 (arg0_poly16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64p8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+-
+- out_poly8x8_t = vrev64_p8 (arg0_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64s16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64s16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vrev64_s16 (arg0_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64s32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64s32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vrev64_s32 (arg0_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64s8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vrev64_s8 (arg0_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64u16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64u16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16x4_t = vrev64_u16 (arg0_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64u32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64u32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x2_t = vrev64_u32 (arg0_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrev64u8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrev64u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrev64u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vrev64_u8 (arg0_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrev64\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrndaf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrndaf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_v8_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_v8_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrndaf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vrnda_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrinta\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrndaqf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrndaq_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_v8_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_v8_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrndaqf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x4_t = vrndaq_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrinta\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrndf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrndf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_v8_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_v8_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrndf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vrnd_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrintz\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrndmf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrndmf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_v8_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_v8_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrndmf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vrndm_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrintm\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrndmqf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrndmq_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_v8_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_v8_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrndmqf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x4_t = vrndmq_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrintm\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrndnf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrndnf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_v8_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_v8_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrndnf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vrndn_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrintn\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrndnqf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrndnq_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_v8_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_v8_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrndnqf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x4_t = vrndnq_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrintn\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrndpf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrndpf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_v8_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_v8_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrndpf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vrndp_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrintp\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrndpqf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrndpq_f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_v8_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_v8_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrndpqf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x4_t = vrndpq_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrintp\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrndqf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrndqf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_v8_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_v8_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrndqf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x4_t = vrndq_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrintz\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrsqrteQf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrsqrteQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrsqrteQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+-
+- out_float32x4_t = vrsqrteq_f32 (arg0_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsqrte\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrsqrteQu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrsqrteQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrsqrteQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x4_t = vrsqrteq_u32 (arg0_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsqrte\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrsqrtef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrsqrtef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrsqrtef32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+-
+- out_float32x2_t = vrsqrte_f32 (arg0_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsqrte\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrsqrteu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vrsqrteu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrsqrteu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x2_t = vrsqrte_u32 (arg0_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsqrte\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrsqrtsQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vrsqrtsQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrsqrtsQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vrsqrtsq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsqrts\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vrsqrtsf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vrsqrtsf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vrsqrtsf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vrsqrts_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vrsqrts\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_lanef32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_lanef32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32_t arg0_float32_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vsetq_lane_f32 (arg0_float32_t, arg1_float32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_lanep16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_lanep16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16_t arg0_poly16_t;
+- poly16x8_t arg1_poly16x8_t;
+-
+- out_poly16x8_t = vsetq_lane_p16 (arg0_poly16_t, arg1_poly16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.16\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_lanep8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_lanep8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8_t arg0_poly8_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_poly8x16_t = vsetq_lane_p8 (arg0_poly8_t, arg1_poly8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.8\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_lanes16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_lanes16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16_t arg0_int16_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vsetq_lane_s16 (arg0_int16_t, arg1_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.16\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_lanes32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_lanes32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32_t arg0_int32_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vsetq_lane_s32 (arg0_int32_t, arg1_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_lanes64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_lanes64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_lanes64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64_t arg0_int64_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vsetq_lane_s64 (arg0_int64_t, arg1_int64x2_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[rR\]\[0-9\]+, \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_lanes8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_lanes8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8_t arg0_int8_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vsetq_lane_s8 (arg0_int8_t, arg1_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.8\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_laneu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_laneu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16_t arg0_uint16_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vsetq_lane_u16 (arg0_uint16_t, arg1_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.16\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_laneu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_laneu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32_t arg0_uint32_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vsetq_lane_u32 (arg0_uint32_t, arg1_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_laneu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_laneu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_laneu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64_t arg0_uint64_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vsetq_lane_u64 (arg0_uint64_t, arg1_uint64x2_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\[ \]+\[dD\]\[0-9\]+, \[rR\]\[0-9\]+, \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsetQ_laneu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsetQ_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsetQ_laneu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8_t arg0_uint8_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vsetq_lane_u8 (arg0_uint8_t, arg1_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.8\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_lanef32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vset_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_lanef32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32_t arg0_float32_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vset_lane_f32 (arg0_float32_t, arg1_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_lanep16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vset_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_lanep16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16_t arg0_poly16_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- out_poly16x4_t = vset_lane_p16 (arg0_poly16_t, arg1_poly16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.16\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_lanep8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vset_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_lanep8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8_t arg0_poly8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly8x8_t = vset_lane_p8 (arg0_poly8_t, arg1_poly8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.8\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_lanes16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vset_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_lanes16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16_t arg0_int16_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vset_lane_s16 (arg0_int16_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.16\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_lanes32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vset_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_lanes32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32_t arg0_int32_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vset_lane_s32 (arg0_int32_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_lanes64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vset_lanes64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_lanes64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64_t arg0_int64_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vset_lane_s64 (arg0_int64_t, arg1_int64x1_t, 0);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_lanes8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vset_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_lanes8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8_t arg0_int8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vset_lane_s8 (arg0_int8_t, arg1_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.8\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_laneu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vset_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_laneu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16_t arg0_uint16_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vset_lane_u16 (arg0_uint16_t, arg1_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.16\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_laneu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vset_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_laneu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32_t arg0_uint32_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vset_lane_u32 (arg0_uint32_t, arg1_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.32\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_laneu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vset_laneu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_laneu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64_t arg0_uint64_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vset_lane_u64 (arg0_uint64_t, arg1_uint64x1_t, 0);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vset_laneu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vset_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vset_laneu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8_t arg0_uint8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vset_lane_u8 (arg0_uint8_t, arg1_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vmov\.8\[ \]+\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[rR\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQ_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshlQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vshlq_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQ_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshlQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vshlq_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQ_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshlQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQ_ns64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int64x2_t = vshlq_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQ_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshlQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQ_ns8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vshlq_n_s8 (arg0_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQ_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshlQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16x8_t = vshlq_n_u16 (arg0_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQ_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshlQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x4_t = vshlq_n_u32 (arg0_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQ_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshlQ_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQ_nu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint64x2_t = vshlq_n_u64 (arg0_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQ_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshlQ_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQ_nu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x16_t = vshlq_n_u8 (arg0_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vshlq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vshlq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vshlq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vshlq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_uint16x8_t = vshlq_u16 (arg0_uint16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_uint32x4_t = vshlq_u32 (arg0_uint32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_uint64x2_t = vshlq_u64 (arg0_uint64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_uint8x16_t = vshlq_u8 (arg0_uint8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshl_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshl_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshl_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vshl_n_s16 (arg0_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshl_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshl_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshl_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vshl_n_s32 (arg0_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshl_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshl_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshl_ns64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_int64x1_t = vshl_n_s64 (arg0_int64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshl_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshl_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshl_ns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vshl_n_s8 (arg0_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshl_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshl_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshl_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16x4_t = vshl_n_u16 (arg0_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshl_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshl_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshl_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x2_t = vshl_n_u32 (arg0_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshl_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshl_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshl_nu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_uint64x1_t = vshl_n_u64 (arg0_uint64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshl_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshl_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshl_nu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vshl_n_u8 (arg0_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshll_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshll_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshll_ns16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int32x4_t = vshll_n_s16 (arg0_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshll\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshll_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshll_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshll_ns32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int64x2_t = vshll_n_s32 (arg0_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshll\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshll_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshll_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshll_ns8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int16x8_t = vshll_n_s8 (arg0_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshll\.s8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshll_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshll_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshll_nu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint32x4_t = vshll_n_u16 (arg0_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshll\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshll_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshll_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshll_nu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint64x2_t = vshll_n_u32 (arg0_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshll\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshll_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshll_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshll_nu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint16x8_t = vshll_n_u8 (arg0_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshll\.u8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshls16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshls16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vshl_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshls32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshls32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vshl_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshls64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshls64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshls64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vshl_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshls8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshls8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vshl_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_uint16x4_t = vshl_u16 (arg0_uint16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_uint32x2_t = vshl_u32 (arg0_uint32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_uint64x1_t = vshl_u64 (arg0_uint64x1_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshlu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vshlu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshlu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_uint8x8_t = vshl_u8 (arg0_uint8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vshl\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrQ_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int16x8_t = vshrq_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrQ_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int32x4_t = vshrq_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrQ_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrQ_ns64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int64x2_t = vshrq_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrQ_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrQ_ns8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+-
+- out_int8x16_t = vshrq_n_s8 (arg0_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrQ_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint16x8_t = vshrq_n_u16 (arg0_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrQ_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint32x4_t = vshrq_n_u32 (arg0_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrQ_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrQ_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrQ_nu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint64x2_t = vshrq_n_u64 (arg0_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrQ_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrQ_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrQ_nu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+-
+- out_uint8x16_t = vshrq_n_u8 (arg0_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshr_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshr_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshr_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+-
+- out_int16x4_t = vshr_n_s16 (arg0_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshr_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshr_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshr_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+-
+- out_int32x2_t = vshr_n_s32 (arg0_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshr_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshr_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshr_ns64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+-
+- out_int64x1_t = vshr_n_s64 (arg0_int64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshr_ns8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshr_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshr_ns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+-
+- out_int8x8_t = vshr_n_s8 (arg0_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshr_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshr_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshr_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+-
+- out_uint16x4_t = vshr_n_u16 (arg0_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshr_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshr_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshr_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+-
+- out_uint32x2_t = vshr_n_u32 (arg0_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshr_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshr_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshr_nu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+-
+- out_uint64x1_t = vshr_n_u64 (arg0_uint64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshr_nu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshr_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshr_nu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+-
+- out_uint8x8_t = vshr_n_u8 (arg0_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshr\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrn_ns16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrn_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrn_ns16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x8_t arg0_int16x8_t;
+-
+- out_int8x8_t = vshrn_n_s16 (arg0_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshrn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrn_ns32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrn_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrn_ns32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x4_t arg0_int32x4_t;
+-
+- out_int16x4_t = vshrn_n_s32 (arg0_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshrn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrn_ns64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrn_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrn_ns64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x2_t arg0_int64x2_t;
+-
+- out_int32x2_t = vshrn_n_s64 (arg0_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshrn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrn_nu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrn_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrn_nu16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x8_t arg0_uint16x8_t;
+-
+- out_uint8x8_t = vshrn_n_u16 (arg0_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshrn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrn_nu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrn_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrn_nu32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x4_t arg0_uint32x4_t;
+-
+- out_uint16x4_t = vshrn_n_u32 (arg0_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshrn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vshrn_nu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vshrn_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vshrn_nu64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x2_t arg0_uint64x2_t;
+-
+- out_uint32x2_t = vshrn_n_u64 (arg0_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vshrn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_np16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_np16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_np16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16x8_t arg0_poly16x8_t;
+- poly16x8_t arg1_poly16x8_t;
+-
+- out_poly16x8_t = vsliq_n_p16 (arg0_poly16x8_t, arg1_poly16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_np64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_np64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_np64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- poly64x2_t arg0_poly64x2_t;
+- poly64x2_t arg1_poly64x2_t;
+-
+- out_poly64x2_t = vsliq_n_p64 (arg0_poly64x2_t, arg1_poly64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_np8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_np8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_np8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_poly8x16_t = vsliq_n_p8 (arg0_poly8x16_t, arg1_poly8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vsliq_n_s16 (arg0_int16x8_t, arg1_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vsliq_n_s32 (arg0_int32x4_t, arg1_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_ns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_ns64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vsliq_n_s64 (arg0_int64x2_t, arg1_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_ns8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_ns8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vsliq_n_s8 (arg0_int8x16_t, arg1_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vsliq_n_u16 (arg0_uint16x8_t, arg1_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vsliq_n_u32 (arg0_uint32x4_t, arg1_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_nu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_nu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vsliq_n_u64 (arg0_uint64x2_t, arg1_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsliQ_nu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsliQ_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsliQ_nu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vsliq_n_u8 (arg0_uint8x16_t, arg1_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_np16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_np16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_np16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16x4_t arg0_poly16x4_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- out_poly16x4_t = vsli_n_p16 (arg0_poly16x4_t, arg1_poly16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_np64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_np64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_np64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- poly64x1_t arg0_poly64x1_t;
+- poly64x1_t arg1_poly64x1_t;
+-
+- out_poly64x1_t = vsli_n_p64 (arg0_poly64x1_t, arg1_poly64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_np8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_np8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_np8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly8x8_t = vsli_n_p8 (arg0_poly8x8_t, arg1_poly8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vsli_n_s16 (arg0_int16x4_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vsli_n_s32 (arg0_int32x2_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_ns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_ns64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vsli_n_s64 (arg0_int64x1_t, arg1_int64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_ns8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_ns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vsli_n_s8 (arg0_int8x8_t, arg1_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vsli_n_u16 (arg0_uint16x4_t, arg1_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vsli_n_u32 (arg0_uint32x2_t, arg1_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_nu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_nu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vsli_n_u64 (arg0_uint64x1_t, arg1_uint64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsli_nu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsli_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsli_nu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vsli_n_u8 (arg0_uint8x8_t, arg1_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsli\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsraQ_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsraQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsraQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vsraq_n_s16 (arg0_int16x8_t, arg1_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsraQ_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsraQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsraQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vsraq_n_s32 (arg0_int32x4_t, arg1_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsraQ_ns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsraQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsraQ_ns64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vsraq_n_s64 (arg0_int64x2_t, arg1_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.s64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsraQ_ns8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsraQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsraQ_ns8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vsraq_n_s8 (arg0_int8x16_t, arg1_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsraQ_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsraQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsraQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vsraq_n_u16 (arg0_uint16x8_t, arg1_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsraQ_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsraQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsraQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vsraq_n_u32 (arg0_uint32x4_t, arg1_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsraQ_nu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsraQ_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsraQ_nu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vsraq_n_u64 (arg0_uint64x2_t, arg1_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.u64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsraQ_nu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsraQ_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsraQ_nu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vsraq_n_u8 (arg0_uint8x16_t, arg1_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsra_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsra_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsra_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vsra_n_s16 (arg0_int16x4_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsra_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsra_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsra_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vsra_n_s32 (arg0_int32x2_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsra_ns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsra_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsra_ns64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vsra_n_s64 (arg0_int64x1_t, arg1_int64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.s64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsra_ns8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsra_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsra_ns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vsra_n_s8 (arg0_int8x8_t, arg1_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsra_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsra_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsra_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vsra_n_u16 (arg0_uint16x4_t, arg1_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.u16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsra_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsra_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsra_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vsra_n_u32 (arg0_uint32x2_t, arg1_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.u32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsra_nu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsra_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsra_nu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vsra_n_u64 (arg0_uint64x1_t, arg1_uint64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.u64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsra_nu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsra_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsra_nu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vsra_n_u8 (arg0_uint8x8_t, arg1_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsra\.u8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_np16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_np16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_np16 (void)
+-{
+- poly16x8_t out_poly16x8_t;
+- poly16x8_t arg0_poly16x8_t;
+- poly16x8_t arg1_poly16x8_t;
+-
+- out_poly16x8_t = vsriq_n_p16 (arg0_poly16x8_t, arg1_poly16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_np64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_np64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_np64 (void)
+-{
+- poly64x2_t out_poly64x2_t;
+- poly64x2_t arg0_poly64x2_t;
+- poly64x2_t arg1_poly64x2_t;
+-
+- out_poly64x2_t = vsriq_n_p64 (arg0_poly64x2_t, arg1_poly64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_np8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_np8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_np8 (void)
+-{
+- poly8x16_t out_poly8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_poly8x16_t = vsriq_n_p8 (arg0_poly8x16_t, arg1_poly8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_ns16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vsriq_n_s16 (arg0_int16x8_t, arg1_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_ns32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vsriq_n_s32 (arg0_int32x4_t, arg1_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_ns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_ns64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vsriq_n_s64 (arg0_int64x2_t, arg1_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_ns8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_ns8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vsriq_n_s8 (arg0_int8x16_t, arg1_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_nu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vsriq_n_u16 (arg0_uint16x8_t, arg1_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_nu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vsriq_n_u32 (arg0_uint32x4_t, arg1_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_nu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_nu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vsriq_n_u64 (arg0_uint64x2_t, arg1_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsriQ_nu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsriQ_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsriQ_nu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vsriq_n_u8 (arg0_uint8x16_t, arg1_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_np16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_np16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_np16 (void)
+-{
+- poly16x4_t out_poly16x4_t;
+- poly16x4_t arg0_poly16x4_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- out_poly16x4_t = vsri_n_p16 (arg0_poly16x4_t, arg1_poly16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_np64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_np64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_np64 (void)
+-{
+- poly64x1_t out_poly64x1_t;
+- poly64x1_t arg0_poly64x1_t;
+- poly64x1_t arg1_poly64x1_t;
+-
+- out_poly64x1_t = vsri_n_p64 (arg0_poly64x1_t, arg1_poly64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_np8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_np8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_np8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly8x8_t = vsri_n_p8 (arg0_poly8x8_t, arg1_poly8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_ns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_ns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_ns16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vsri_n_s16 (arg0_int16x4_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_ns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_ns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_ns32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vsri_n_s32 (arg0_int32x2_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_ns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_ns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_ns64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vsri_n_s64 (arg0_int64x1_t, arg1_int64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_ns8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_ns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_ns8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vsri_n_s8 (arg0_int8x8_t, arg1_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_nu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_nu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_nu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vsri_n_u16 (arg0_uint16x4_t, arg1_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_nu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_nu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_nu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vsri_n_u32 (arg0_uint32x2_t, arg1_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_nu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_nu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_nu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vsri_n_u64 (arg0_uint64x1_t, arg1_uint64x1_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.64\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsri_nu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsri_nu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsri_nu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vsri_n_u8 (arg0_uint8x8_t, arg1_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vsri\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, #\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_lanef32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x4_t arg1_float32x4_t;
+-
+- vst1q_lane_f32 (arg0_float32_t, arg1_float32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_lanep16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x8_t arg1_poly16x8_t;
+-
+- vst1q_lane_p16 (arg0_poly16_t, arg1_poly16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_lanep64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_lanep64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_lanep64 (void)
+-{
+- poly64_t *arg0_poly64_t;
+- poly64x2_t arg1_poly64x2_t;
+-
+- vst1q_lane_p64 (arg0_poly64_t, arg1_poly64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_lanep8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- vst1q_lane_p8 (arg0_poly8_t, arg1_poly8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_lanes16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x8_t arg1_int16x8_t;
+-
+- vst1q_lane_s16 (arg0_int16_t, arg1_int16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_lanes32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x4_t arg1_int32x4_t;
+-
+- vst1q_lane_s32 (arg0_int32_t, arg1_int32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_lanes64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_lanes64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_lanes64 (void)
+-{
+- int64_t *arg0_int64_t;
+- int64x2_t arg1_int64x2_t;
+-
+- vst1q_lane_s64 (arg0_int64_t, arg1_int64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_lanes8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x16_t arg1_int8x16_t;
+-
+- vst1q_lane_s8 (arg0_int8_t, arg1_int8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_laneu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- vst1q_lane_u16 (arg0_uint16_t, arg1_uint16x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_laneu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- vst1q_lane_u32 (arg0_uint32_t, arg1_uint32x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_laneu64-1.c
++++ b/src//dev/null
+@@ -1,25 +0,0 @@
+-/* Test the `vst1Q_laneu64' ARM Neon intrinsic. */
+-
+-/* Detect ICE in the case of unaligned memory address. */
+-
+-/* { dg-do compile } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-unsigned char dummy_store[1000];
+-
+-void
+-foo (char* addr)
+-{
+- uint8x16_t vdata = vld1q_u8 (addr);
+- vst1q_lane_u64 ((uint64_t*) &dummy_store, vreinterpretq_u64_u8 (vdata), 0);
+-}
+-
+-uint64_t
+-bar (uint64x2_t vdata)
+-{
+- vdata = vld1q_lane_u64 ((uint64_t*) &dummy_store, vdata, 0);
+- return vgetq_lane_u64 (vdata, 0);
+-}
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_laneu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_laneu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_laneu64 (void)
+-{
+- uint64_t *arg0_uint64_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- vst1q_lane_u64 (arg0_uint64_t, arg1_uint64x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Q_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Q_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Q_laneu8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- vst1q_lane_u8 (arg0_uint8_t, arg1_uint8x16_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qf32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qf32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x4_t arg1_float32x4_t;
+-
+- vst1q_f32 (arg0_float32_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qp16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qp16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x8_t arg1_poly16x8_t;
+-
+- vst1q_p16 (arg0_poly16_t, arg1_poly16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qp64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qp64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qp64 (void)
+-{
+- poly64_t *arg0_poly64_t;
+- poly64x2_t arg1_poly64x2_t;
+-
+- vst1q_p64 (arg0_poly64_t, arg1_poly64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qp8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qp8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- vst1q_p8 (arg0_poly8_t, arg1_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qs16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qs16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x8_t arg1_int16x8_t;
+-
+- vst1q_s16 (arg0_int16_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qs32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qs32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x4_t arg1_int32x4_t;
+-
+- vst1q_s32 (arg0_int32_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qs64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qs64 (void)
+-{
+- int64_t *arg0_int64_t;
+- int64x2_t arg1_int64x2_t;
+-
+- vst1q_s64 (arg0_int64_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qs8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qs8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x16_t arg1_int8x16_t;
+-
+- vst1q_s8 (arg0_int8_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- vst1q_u16 (arg0_uint16_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- vst1q_u32 (arg0_uint32_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qu64 (void)
+-{
+- uint64_t *arg0_uint64_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- vst1q_u64 (arg0_uint64_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1Qu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1Qu8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- vst1q_u8 (arg0_uint8_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_lanef32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x2_t arg1_float32x2_t;
+-
+- vst1_lane_f32 (arg0_float32_t, arg1_float32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_lanep16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- vst1_lane_p16 (arg0_poly16_t, arg1_poly16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_lanep64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_lanep64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_lanep64 (void)
+-{
+- poly64_t *arg0_poly64_t;
+- poly64x1_t arg1_poly64x1_t;
+-
+- vst1_lane_p64 (arg0_poly64_t, arg1_poly64x1_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_lanep8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- vst1_lane_p8 (arg0_poly8_t, arg1_poly8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_lanes16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x4_t arg1_int16x4_t;
+-
+- vst1_lane_s16 (arg0_int16_t, arg1_int16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_lanes32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x2_t arg1_int32x2_t;
+-
+- vst1_lane_s32 (arg0_int32_t, arg1_int32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_lanes64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_lanes64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_lanes64 (void)
+-{
+- int64_t *arg0_int64_t;
+- int64x1_t arg1_int64x1_t;
+-
+- vst1_lane_s64 (arg0_int64_t, arg1_int64x1_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_lanes8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- vst1_lane_s8 (arg0_int8_t, arg1_int8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_laneu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- vst1_lane_u16 (arg0_uint16_t, arg1_uint16x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_laneu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- vst1_lane_u32 (arg0_uint32_t, arg1_uint32x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_laneu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_laneu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_laneu64 (void)
+-{
+- uint64_t *arg0_uint64_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- vst1_lane_u64 (arg0_uint64_t, arg1_uint64x1_t, 0);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1_laneu8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- vst1_lane_u8 (arg0_uint8_t, arg1_uint8x8_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]\\\})|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1f32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x2_t arg1_float32x2_t;
+-
+- vst1_f32 (arg0_float32_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1p16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1p16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- vst1_p16 (arg0_poly16_t, arg1_poly16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1p64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1p64 (void)
+-{
+- poly64_t *arg0_poly64_t;
+- poly64x1_t arg1_poly64x1_t;
+-
+- vst1_p64 (arg0_poly64_t, arg1_poly64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1p8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1p8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- vst1_p8 (arg0_poly8_t, arg1_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1s16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1s16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x4_t arg1_int16x4_t;
+-
+- vst1_s16 (arg0_int16_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1s32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1s32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x2_t arg1_int32x2_t;
+-
+- vst1_s32 (arg0_int32_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1s64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1s64 (void)
+-{
+- int64_t *arg0_int64_t;
+- int64x1_t arg1_int64x1_t;
+-
+- vst1_s64 (arg0_int64_t, arg1_int64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1s8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1s8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- vst1_s8 (arg0_int8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1u16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1u16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- vst1_u16 (arg0_uint16_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.16\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1u32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1u32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- vst1_u32 (arg0_uint32_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.32\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1u64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1u64 (void)
+-{
+- uint64_t *arg0_uint64_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- vst1_u64 (arg0_uint64_t, arg1_uint64x1_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst1u8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst1u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst1u8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- vst1_u8 (arg0_uint8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.8\[ \]+((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Q_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2Q_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Q_lanef32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x4x2_t arg1_float32x4x2_t;
+-
+- vst2q_lane_f32 (arg0_float32_t, arg1_float32x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Q_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2Q_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Q_lanep16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x8x2_t arg1_poly16x8x2_t;
+-
+- vst2q_lane_p16 (arg0_poly16_t, arg1_poly16x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Q_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2Q_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Q_lanes16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x8x2_t arg1_int16x8x2_t;
+-
+- vst2q_lane_s16 (arg0_int16_t, arg1_int16x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Q_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2Q_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Q_lanes32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x4x2_t arg1_int32x4x2_t;
+-
+- vst2q_lane_s32 (arg0_int32_t, arg1_int32x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Q_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2Q_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Q_laneu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x8x2_t arg1_uint16x8x2_t;
+-
+- vst2q_lane_u16 (arg0_uint16_t, arg1_uint16x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Q_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2Q_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Q_laneu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x4x2_t arg1_uint32x4x2_t;
+-
+- vst2q_lane_u32 (arg0_uint32_t, arg1_uint32x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Qf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst2Qf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Qf32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x4x2_t arg1_float32x4x2_t;
+-
+- vst2q_f32 (arg0_float32_t, arg1_float32x4x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Qp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst2Qp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Qp16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x8x2_t arg1_poly16x8x2_t;
+-
+- vst2q_p16 (arg0_poly16_t, arg1_poly16x8x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Qp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst2Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Qp8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x16x2_t arg1_poly8x16x2_t;
+-
+- vst2q_p8 (arg0_poly8_t, arg1_poly8x16x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Qs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst2Qs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Qs16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x8x2_t arg1_int16x8x2_t;
+-
+- vst2q_s16 (arg0_int16_t, arg1_int16x8x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Qs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst2Qs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Qs32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x4x2_t arg1_int32x4x2_t;
+-
+- vst2q_s32 (arg0_int32_t, arg1_int32x4x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Qs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst2Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Qs8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x16x2_t arg1_int8x16x2_t;
+-
+- vst2q_s8 (arg0_int8_t, arg1_int8x16x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Qu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst2Qu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Qu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x8x2_t arg1_uint16x8x2_t;
+-
+- vst2q_u16 (arg0_uint16_t, arg1_uint16x8x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Qu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst2Qu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Qu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x4x2_t arg1_uint32x4x2_t;
+-
+- vst2q_u32 (arg0_uint32_t, arg1_uint32x4x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2Qu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst2Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2Qu8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x16x2_t arg1_uint8x16x2_t;
+-
+- vst2q_u8 (arg0_uint8_t, arg1_uint8x16x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2_lanef32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x2x2_t arg1_float32x2x2_t;
+-
+- vst2_lane_f32 (arg0_float32_t, arg1_float32x2x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2_lanep16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x4x2_t arg1_poly16x4x2_t;
+-
+- vst2_lane_p16 (arg0_poly16_t, arg1_poly16x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2_lanep8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x8x2_t arg1_poly8x8x2_t;
+-
+- vst2_lane_p8 (arg0_poly8_t, arg1_poly8x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2_lanes16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x4x2_t arg1_int16x4x2_t;
+-
+- vst2_lane_s16 (arg0_int16_t, arg1_int16x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2_lanes32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x2x2_t arg1_int32x2x2_t;
+-
+- vst2_lane_s32 (arg0_int32_t, arg1_int32x2x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2_lanes8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x8x2_t arg1_int8x8x2_t;
+-
+- vst2_lane_s8 (arg0_int8_t, arg1_int8x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2_laneu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x4x2_t arg1_uint16x4x2_t;
+-
+- vst2_lane_u16 (arg0_uint16_t, arg1_uint16x4x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2_laneu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x2x2_t arg1_uint32x2x2_t;
+-
+- vst2_lane_u32 (arg0_uint32_t, arg1_uint32x2x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2_laneu8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x8x2_t arg1_uint8x8x2_t;
+-
+- vst2_lane_u8 (arg0_uint8_t, arg1_uint8x8x2_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2f32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x2x2_t arg1_float32x2x2_t;
+-
+- vst2_f32 (arg0_float32_t, arg1_float32x2x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2p16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2p16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x4x2_t arg1_poly16x4x2_t;
+-
+- vst2_p16 (arg0_poly16_t, arg1_poly16x4x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2p64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2p64 (void)
+-{
+- poly64_t *arg0_poly64_t;
+- poly64x1x2_t arg1_poly64x1x2_t;
+-
+- vst2_p64 (arg0_poly64_t, arg1_poly64x1x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2p8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2p8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x8x2_t arg1_poly8x8x2_t;
+-
+- vst2_p8 (arg0_poly8_t, arg1_poly8x8x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2s16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2s16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x4x2_t arg1_int16x4x2_t;
+-
+- vst2_s16 (arg0_int16_t, arg1_int16x4x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2s32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2s32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x2x2_t arg1_int32x2x2_t;
+-
+- vst2_s32 (arg0_int32_t, arg1_int32x2x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2s64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2s64 (void)
+-{
+- int64_t *arg0_int64_t;
+- int64x1x2_t arg1_int64x1x2_t;
+-
+- vst2_s64 (arg0_int64_t, arg1_int64x1x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2s8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2s8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x8x2_t arg1_int8x8x2_t;
+-
+- vst2_s8 (arg0_int8_t, arg1_int8x8x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2u16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2u16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x4x2_t arg1_uint16x4x2_t;
+-
+- vst2_u16 (arg0_uint16_t, arg1_uint16x4x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2u32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2u32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x2x2_t arg1_uint32x2x2_t;
+-
+- vst2_u32 (arg0_uint32_t, arg1_uint32x2x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2u64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2u64 (void)
+-{
+- uint64_t *arg0_uint64_t;
+- uint64x1x2_t arg1_uint64x1x2_t;
+-
+- vst2_u64 (arg0_uint64_t, arg1_uint64x1x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst2u8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst2u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst2u8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x8x2_t arg1_uint8x8x2_t;
+-
+- vst2_u8 (arg0_uint8_t, arg1_uint8x8x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst2\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Q_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3Q_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Q_lanef32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x4x3_t arg1_float32x4x3_t;
+-
+- vst3q_lane_f32 (arg0_float32_t, arg1_float32x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Q_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3Q_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Q_lanep16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x8x3_t arg1_poly16x8x3_t;
+-
+- vst3q_lane_p16 (arg0_poly16_t, arg1_poly16x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Q_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3Q_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Q_lanes16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x8x3_t arg1_int16x8x3_t;
+-
+- vst3q_lane_s16 (arg0_int16_t, arg1_int16x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Q_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3Q_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Q_lanes32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x4x3_t arg1_int32x4x3_t;
+-
+- vst3q_lane_s32 (arg0_int32_t, arg1_int32x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Q_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3Q_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Q_laneu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x8x3_t arg1_uint16x8x3_t;
+-
+- vst3q_lane_u16 (arg0_uint16_t, arg1_uint16x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Q_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3Q_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Q_laneu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x4x3_t arg1_uint32x4x3_t;
+-
+- vst3q_lane_u32 (arg0_uint32_t, arg1_uint32x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Qf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst3Qf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Qf32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x4x3_t arg1_float32x4x3_t;
+-
+- vst3q_f32 (arg0_float32_t, arg1_float32x4x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Qp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst3Qp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Qp16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x8x3_t arg1_poly16x8x3_t;
+-
+- vst3q_p16 (arg0_poly16_t, arg1_poly16x8x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Qp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst3Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Qp8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x16x3_t arg1_poly8x16x3_t;
+-
+- vst3q_p8 (arg0_poly8_t, arg1_poly8x16x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Qs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst3Qs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Qs16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x8x3_t arg1_int16x8x3_t;
+-
+- vst3q_s16 (arg0_int16_t, arg1_int16x8x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Qs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst3Qs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Qs32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x4x3_t arg1_int32x4x3_t;
+-
+- vst3q_s32 (arg0_int32_t, arg1_int32x4x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Qs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst3Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Qs8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x16x3_t arg1_int8x16x3_t;
+-
+- vst3q_s8 (arg0_int8_t, arg1_int8x16x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Qu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst3Qu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Qu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x8x3_t arg1_uint16x8x3_t;
+-
+- vst3q_u16 (arg0_uint16_t, arg1_uint16x8x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Qu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst3Qu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Qu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x4x3_t arg1_uint32x4x3_t;
+-
+- vst3q_u32 (arg0_uint32_t, arg1_uint32x4x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3Qu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst3Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3Qu8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x16x3_t arg1_uint8x16x3_t;
+-
+- vst3q_u8 (arg0_uint8_t, arg1_uint8x16x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3_lanef32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x2x3_t arg1_float32x2x3_t;
+-
+- vst3_lane_f32 (arg0_float32_t, arg1_float32x2x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3_lanep16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x4x3_t arg1_poly16x4x3_t;
+-
+- vst3_lane_p16 (arg0_poly16_t, arg1_poly16x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3_lanep8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x8x3_t arg1_poly8x8x3_t;
+-
+- vst3_lane_p8 (arg0_poly8_t, arg1_poly8x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3_lanes16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x4x3_t arg1_int16x4x3_t;
+-
+- vst3_lane_s16 (arg0_int16_t, arg1_int16x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3_lanes32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x2x3_t arg1_int32x2x3_t;
+-
+- vst3_lane_s32 (arg0_int32_t, arg1_int32x2x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3_lanes8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x8x3_t arg1_int8x8x3_t;
+-
+- vst3_lane_s8 (arg0_int8_t, arg1_int8x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3_laneu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x4x3_t arg1_uint16x4x3_t;
+-
+- vst3_lane_u16 (arg0_uint16_t, arg1_uint16x4x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3_laneu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x2x3_t arg1_uint32x2x3_t;
+-
+- vst3_lane_u32 (arg0_uint32_t, arg1_uint32x2x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3_laneu8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x8x3_t arg1_uint8x8x3_t;
+-
+- vst3_lane_u8 (arg0_uint8_t, arg1_uint8x8x3_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3f32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x2x3_t arg1_float32x2x3_t;
+-
+- vst3_f32 (arg0_float32_t, arg1_float32x2x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3p16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3p16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x4x3_t arg1_poly16x4x3_t;
+-
+- vst3_p16 (arg0_poly16_t, arg1_poly16x4x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3p64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3p64 (void)
+-{
+- poly64_t *arg0_poly64_t;
+- poly64x1x3_t arg1_poly64x1x3_t;
+-
+- vst3_p64 (arg0_poly64_t, arg1_poly64x1x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3p8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3p8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x8x3_t arg1_poly8x8x3_t;
+-
+- vst3_p8 (arg0_poly8_t, arg1_poly8x8x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3s16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3s16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x4x3_t arg1_int16x4x3_t;
+-
+- vst3_s16 (arg0_int16_t, arg1_int16x4x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3s32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3s32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x2x3_t arg1_int32x2x3_t;
+-
+- vst3_s32 (arg0_int32_t, arg1_int32x2x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3s64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3s64 (void)
+-{
+- int64_t *arg0_int64_t;
+- int64x1x3_t arg1_int64x1x3_t;
+-
+- vst3_s64 (arg0_int64_t, arg1_int64x1x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3s8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3s8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x8x3_t arg1_int8x8x3_t;
+-
+- vst3_s8 (arg0_int8_t, arg1_int8x8x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3u16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3u16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x4x3_t arg1_uint16x4x3_t;
+-
+- vst3_u16 (arg0_uint16_t, arg1_uint16x4x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3u32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3u32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x2x3_t arg1_uint32x2x3_t;
+-
+- vst3_u32 (arg0_uint32_t, arg1_uint32x2x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3u64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3u64 (void)
+-{
+- uint64_t *arg0_uint64_t;
+- uint64x1x3_t arg1_uint64x1x3_t;
+-
+- vst3_u64 (arg0_uint64_t, arg1_uint64x1x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst3u8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst3u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst3u8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x8x3_t arg1_uint8x8x3_t;
+-
+- vst3_u8 (arg0_uint8_t, arg1_uint8x8x3_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst3\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Q_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4Q_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Q_lanef32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x4x4_t arg1_float32x4x4_t;
+-
+- vst4q_lane_f32 (arg0_float32_t, arg1_float32x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Q_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4Q_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Q_lanep16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x8x4_t arg1_poly16x8x4_t;
+-
+- vst4q_lane_p16 (arg0_poly16_t, arg1_poly16x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Q_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4Q_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Q_lanes16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x8x4_t arg1_int16x8x4_t;
+-
+- vst4q_lane_s16 (arg0_int16_t, arg1_int16x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Q_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4Q_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Q_lanes32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x4x4_t arg1_int32x4x4_t;
+-
+- vst4q_lane_s32 (arg0_int32_t, arg1_int32x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Q_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4Q_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Q_laneu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x8x4_t arg1_uint16x8x4_t;
+-
+- vst4q_lane_u16 (arg0_uint16_t, arg1_uint16x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Q_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4Q_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Q_laneu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x4x4_t arg1_uint32x4x4_t;
+-
+- vst4q_lane_u32 (arg0_uint32_t, arg1_uint32x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Qf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst4Qf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Qf32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x4x4_t arg1_float32x4x4_t;
+-
+- vst4q_f32 (arg0_float32_t, arg1_float32x4x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Qp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst4Qp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Qp16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x8x4_t arg1_poly16x8x4_t;
+-
+- vst4q_p16 (arg0_poly16_t, arg1_poly16x8x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Qp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst4Qp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Qp8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x16x4_t arg1_poly8x16x4_t;
+-
+- vst4q_p8 (arg0_poly8_t, arg1_poly8x16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Qs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst4Qs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Qs16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x8x4_t arg1_int16x8x4_t;
+-
+- vst4q_s16 (arg0_int16_t, arg1_int16x8x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Qs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst4Qs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Qs32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x4x4_t arg1_int32x4x4_t;
+-
+- vst4q_s32 (arg0_int32_t, arg1_int32x4x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Qs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst4Qs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Qs8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x16x4_t arg1_int8x16x4_t;
+-
+- vst4q_s8 (arg0_int8_t, arg1_int8x16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Qu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst4Qu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Qu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x8x4_t arg1_uint16x8x4_t;
+-
+- vst4q_u16 (arg0_uint16_t, arg1_uint16x8x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Qu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst4Qu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Qu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x4x4_t arg1_uint32x4x4_t;
+-
+- vst4q_u32 (arg0_uint32_t, arg1_uint32x4x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4Qu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vst4Qu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4Qu8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x16x4_t arg1_uint8x16x4_t;
+-
+- vst4q_u8 (arg0_uint8_t, arg1_uint8x16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4_lanef32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4_lanef32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4_lanef32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x2x4_t arg1_float32x2x4_t;
+-
+- vst4_lane_f32 (arg0_float32_t, arg1_float32x2x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4_lanep16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4_lanep16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4_lanep16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x4x4_t arg1_poly16x4x4_t;
+-
+- vst4_lane_p16 (arg0_poly16_t, arg1_poly16x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4_lanep8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4_lanep8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4_lanep8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x8x4_t arg1_poly8x8x4_t;
+-
+- vst4_lane_p8 (arg0_poly8_t, arg1_poly8x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4_lanes16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4_lanes16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4_lanes16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x4x4_t arg1_int16x4x4_t;
+-
+- vst4_lane_s16 (arg0_int16_t, arg1_int16x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4_lanes32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4_lanes32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4_lanes32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x2x4_t arg1_int32x2x4_t;
+-
+- vst4_lane_s32 (arg0_int32_t, arg1_int32x2x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4_lanes8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4_lanes8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4_lanes8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x8x4_t arg1_int8x8x4_t;
+-
+- vst4_lane_s8 (arg0_int8_t, arg1_int8x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4_laneu16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4_laneu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4_laneu16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x4x4_t arg1_uint16x4x4_t;
+-
+- vst4_lane_u16 (arg0_uint16_t, arg1_uint16x4x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4_laneu32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4_laneu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4_laneu32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x2x4_t arg1_uint32x2x4_t;
+-
+- vst4_lane_u32 (arg0_uint32_t, arg1_uint32x2x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4_laneu8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4_laneu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4_laneu8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x8x4_t arg1_uint8x8x4_t;
+-
+- vst4_lane_u8 (arg0_uint8_t, arg1_uint8x8x4_t, 1);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+\\\[\[0-9\]+\\\]-\[dD\]\[0-9\]+\\\[\[0-9\]+\\\])|(\[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\], \[dD\]\[0-9\]+\\\[\[0-9\]+\\\]))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4f32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4f32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4f32 (void)
+-{
+- float32_t *arg0_float32_t;
+- float32x2x4_t arg1_float32x2x4_t;
+-
+- vst4_f32 (arg0_float32_t, arg1_float32x2x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4p16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4p16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4p16 (void)
+-{
+- poly16_t *arg0_poly16_t;
+- poly16x4x4_t arg1_poly16x4x4_t;
+-
+- vst4_p16 (arg0_poly16_t, arg1_poly16x4x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4p64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4p64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_crypto_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_crypto } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4p64 (void)
+-{
+- poly64_t *arg0_poly64_t;
+- poly64x1x4_t arg1_poly64x1x4_t;
+-
+- vst4_p64 (arg0_poly64_t, arg1_poly64x1x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4p8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4p8 (void)
+-{
+- poly8_t *arg0_poly8_t;
+- poly8x8x4_t arg1_poly8x8x4_t;
+-
+- vst4_p8 (arg0_poly8_t, arg1_poly8x8x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4s16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4s16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4s16 (void)
+-{
+- int16_t *arg0_int16_t;
+- int16x4x4_t arg1_int16x4x4_t;
+-
+- vst4_s16 (arg0_int16_t, arg1_int16x4x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4s32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4s32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4s32 (void)
+-{
+- int32_t *arg0_int32_t;
+- int32x2x4_t arg1_int32x2x4_t;
+-
+- vst4_s32 (arg0_int32_t, arg1_int32x2x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4s64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4s64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4s64 (void)
+-{
+- int64_t *arg0_int64_t;
+- int64x1x4_t arg1_int64x1x4_t;
+-
+- vst4_s64 (arg0_int64_t, arg1_int64x1x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4s8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4s8 (void)
+-{
+- int8_t *arg0_int8_t;
+- int8x8x4_t arg1_int8x8x4_t;
+-
+- vst4_s8 (arg0_int8_t, arg1_int8x8x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4u16.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4u16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4u16 (void)
+-{
+- uint16_t *arg0_uint16_t;
+- uint16x4x4_t arg1_uint16x4x4_t;
+-
+- vst4_u16 (arg0_uint16_t, arg1_uint16x4x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.16\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4u32.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4u32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4u32 (void)
+-{
+- uint32_t *arg0_uint32_t;
+- uint32x2x4_t arg1_uint32x2x4_t;
+-
+- vst4_u32 (arg0_uint32_t, arg1_uint32x2x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.32\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4u64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4u64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4u64 (void)
+-{
+- uint64_t *arg0_uint64_t;
+- uint64x1x4_t arg1_uint64x1x4_t;
+-
+- vst4_u64 (arg0_uint64_t, arg1_uint64x1x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst1\.64\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vst4u8.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vst4u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vst4u8 (void)
+-{
+- uint8_t *arg0_uint8_t;
+- uint8x8x4_t arg1_uint8x8x4_t;
+-
+- vst4_u8 (arg0_uint8_t, arg1_uint8x8x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vst4\.8\[ \]+\\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \\\[\[rR\]\[0-9\]+\(:\[0-9\]+\)?\\\]!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubQf32 (void)
+-{
+- float32x4_t out_float32x4_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4_t = vsubq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.f32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubQs16 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8_t = vsubq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubQs32 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4_t = vsubq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubQs64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubQs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubQs64 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int64x2_t = vsubq_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubQs8 (void)
+-{
+- int8x16_t out_int8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16_t = vsubq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vsubq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vsubq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubQu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubQu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubQu64 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint64x2_t = vsubq_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i64\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vsubq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubf32 (void)
+-{
+- float32x2_t out_float32x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2_t = vsub_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.f32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubhns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubhns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubhns16 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int8x8_t = vsubhn_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubhn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubhns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubhns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubhns32 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int16x4_t = vsubhn_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubhn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubhns64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubhns64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubhns64 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int64x2_t arg0_int64x2_t;
+- int64x2_t arg1_int64x2_t;
+-
+- out_int32x2_t = vsubhn_s64 (arg0_int64x2_t, arg1_int64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubhn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubhnu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubhnu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubhnu16 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint8x8_t = vsubhn_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubhn\.i16\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubhnu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubhnu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubhnu32 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint16x4_t = vsubhn_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubhn\.i32\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubhnu64.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubhnu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubhnu64 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint64x2_t arg1_uint64x2_t;
+-
+- out_uint32x2_t = vsubhn_u64 (arg0_uint64x2_t, arg1_uint64x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubhn\.i64\[ \]+\[dD\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubls16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubls16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubls16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int32x4_t = vsubl_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubl\.s16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubls32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubls32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubls32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int64x2_t = vsubl_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubl\.s32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubls8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubls8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubls8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int16x8_t = vsubl_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubl\.s8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsublu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsublu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsublu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint32x4_t = vsubl_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubl\.u16\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsublu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsublu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsublu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint64x2_t = vsubl_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubl\.u32\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsublu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsublu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsublu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint16x8_t = vsubl_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubl\.u8\[ \]+\[qQ\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubs16 (void)
+-{
+- int16x4_t out_int16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4_t = vsub_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubs32 (void)
+-{
+- int32x2_t out_int32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2_t = vsub_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubs64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vsubs64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubs64 (void)
+-{
+- int64x1_t out_int64x1_t;
+- int64x1_t arg0_int64x1_t;
+- int64x1_t arg1_int64x1_t;
+-
+- out_int64x1_t = vsub_s64 (arg0_int64x1_t, arg1_int64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubs8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vsub_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vsub_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vsub_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubu64.c
++++ b/src//dev/null
+@@ -1,19 +0,0 @@
+-/* Test the `vsubu64' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubu64 (void)
+-{
+- uint64x1_t out_uint64x1_t;
+- uint64x1_t arg0_uint64x1_t;
+- uint64x1_t arg1_uint64x1_t;
+-
+- out_uint64x1_t = vsub_u64 (arg0_uint64x1_t, arg1_uint64x1_t);
+-}
+-
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vsub_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsub\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubws16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubws16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubws16 (void)
+-{
+- int32x4_t out_int32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int32x4_t = vsubw_s16 (arg0_int32x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubw\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubws32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubws32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubws32 (void)
+-{
+- int64x2_t out_int64x2_t;
+- int64x2_t arg0_int64x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int64x2_t = vsubw_s32 (arg0_int64x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubw\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubws8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubws8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubws8 (void)
+-{
+- int16x8_t out_int16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int16x8_t = vsubw_s8 (arg0_int16x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubw\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubwu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubwu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubwu16 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint32x4_t = vsubw_u16 (arg0_uint32x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubw\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubwu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubwu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubwu32 (void)
+-{
+- uint64x2_t out_uint64x2_t;
+- uint64x2_t arg0_uint64x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint64x2_t = vsubw_u32 (arg0_uint64x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubw\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vsubwu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vsubwu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vsubwu8 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint16x8_t = vsubw_u8 (arg0_uint16x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vsubw\.u8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl1p8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl1p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl1p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_poly8x8_t = vtbl1_p8 (arg0_poly8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, ((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl1s8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl1s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl1s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vtbl1_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, ((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl1u8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl1u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl1u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vtbl1_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, ((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl2p8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl2p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl2p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8x2_t arg0_poly8x8x2_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_poly8x8_t = vtbl2_p8 (arg0_poly8x8x2_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl2s8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl2s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl2s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8x2_t arg0_int8x8x2_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vtbl2_s8 (arg0_int8x8x2_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl2u8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl2u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl2u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8x2_t arg0_uint8x8x2_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vtbl2_u8 (arg0_uint8x8x2_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl3p8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl3p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl3p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8x3_t arg0_poly8x8x3_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_poly8x8_t = vtbl3_p8 (arg0_poly8x8x3_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl3s8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl3s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl3s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8x3_t arg0_int8x8x3_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vtbl3_s8 (arg0_int8x8x3_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl3u8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl3u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl3u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8x3_t arg0_uint8x8x3_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vtbl3_u8 (arg0_uint8x8x3_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl4p8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl4p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl4p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8x4_t arg0_poly8x8x4_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_poly8x8_t = vtbl4_p8 (arg0_poly8x8x4_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl4s8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl4s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl4s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8x4_t arg0_int8x8x4_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8_t = vtbl4_s8 (arg0_int8x8x4_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbl4u8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtbl4u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbl4u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8x4_t arg0_uint8x8x4_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vtbl4_u8 (arg0_uint8x8x4_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbl\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx1p8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx1p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx1p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_poly8x8_t = vtbx1_p8 (arg0_poly8x8_t, arg1_poly8x8_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, ((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx1s8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx1s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx1s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int8x8_t = vtbx1_s8 (arg0_int8x8_t, arg1_int8x8_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, ((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx1u8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx1u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx1u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint8x8_t = vtbx1_u8 (arg0_uint8x8_t, arg1_uint8x8_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, ((\\\{\[dD\]\[0-9\]+\\\})|(\[dD\]\[0-9\]+)), \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx2p8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx2p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx2p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8x2_t arg1_poly8x8x2_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_poly8x8_t = vtbx2_p8 (arg0_poly8x8_t, arg1_poly8x8x2_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx2s8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx2s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx2s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8x2_t arg1_int8x8x2_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int8x8_t = vtbx2_s8 (arg0_int8x8_t, arg1_int8x8x2_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx2u8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx2u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx2u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8x2_t arg1_uint8x8x2_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint8x8_t = vtbx2_u8 (arg0_uint8x8_t, arg1_uint8x8x2_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx3p8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx3p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx3p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8x3_t arg1_poly8x8x3_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_poly8x8_t = vtbx3_p8 (arg0_poly8x8_t, arg1_poly8x8x3_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx3s8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx3s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx3s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8x3_t arg1_int8x8x3_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int8x8_t = vtbx3_s8 (arg0_int8x8_t, arg1_int8x8x3_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx3u8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx3u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx3u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8x3_t arg1_uint8x8x3_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint8x8_t = vtbx3_u8 (arg0_uint8x8_t, arg1_uint8x8x3_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx4p8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx4p8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx4p8 (void)
+-{
+- poly8x8_t out_poly8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8x4_t arg1_poly8x8x4_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_poly8x8_t = vtbx4_p8 (arg0_poly8x8_t, arg1_poly8x8x4_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx4s8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx4s8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx4s8 (void)
+-{
+- int8x8_t out_int8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8x4_t arg1_int8x8x4_t;
+- int8x8_t arg2_int8x8_t;
+-
+- out_int8x8_t = vtbx4_s8 (arg0_int8x8_t, arg1_int8x8x4_t, arg2_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtbx4u8.c
++++ b/src//dev/null
+@@ -1,21 +0,0 @@
+-/* Test the `vtbx4u8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtbx4u8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8x4_t arg1_uint8x8x4_t;
+- uint8x8_t arg2_uint8x8_t;
+-
+- out_uint8x8_t = vtbx4_u8 (arg0_uint8x8_t, arg1_uint8x8x4_t, arg2_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtbx\.8\[ \]+\[dD\]\[0-9\]+, \\\{((\[dD\]\[0-9\]+-\[dD\]\[0-9\]+)|(\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+))\\\}, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnQf32 (void)
+-{
+- float32x4x2_t out_float32x4x2_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4x2_t = vtrnq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnQp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnQp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnQp16 (void)
+-{
+- poly16x8x2_t out_poly16x8x2_t;
+- poly16x8_t arg0_poly16x8_t;
+- poly16x8_t arg1_poly16x8_t;
+-
+- out_poly16x8x2_t = vtrnq_p16 (arg0_poly16x8_t, arg1_poly16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnQp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnQp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnQp8 (void)
+-{
+- poly8x16x2_t out_poly8x16x2_t;
+- poly8x16_t arg0_poly8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_poly8x16x2_t = vtrnq_p8 (arg0_poly8x16_t, arg1_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnQs16 (void)
+-{
+- int16x8x2_t out_int16x8x2_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8x2_t = vtrnq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnQs32 (void)
+-{
+- int32x4x2_t out_int32x4x2_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4x2_t = vtrnq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnQs8 (void)
+-{
+- int8x16x2_t out_int8x16x2_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16x2_t = vtrnq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnQu16 (void)
+-{
+- uint16x8x2_t out_uint16x8x2_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8x2_t = vtrnq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnQu32 (void)
+-{
+- uint32x4x2_t out_uint32x4x2_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4x2_t = vtrnq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnQu8 (void)
+-{
+- uint8x16x2_t out_uint8x16x2_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16x2_t = vtrnq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnf32 (void)
+-{
+- float32x2x2_t out_float32x2x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2x2_t = vtrn_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnp16 (void)
+-{
+- poly16x4x2_t out_poly16x4x2_t;
+- poly16x4_t arg0_poly16x4_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- out_poly16x4x2_t = vtrn_p16 (arg0_poly16x4_t, arg1_poly16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnp8 (void)
+-{
+- poly8x8x2_t out_poly8x8x2_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly8x8x2_t = vtrn_p8 (arg0_poly8x8_t, arg1_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrns16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrns16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrns16 (void)
+-{
+- int16x4x2_t out_int16x4x2_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4x2_t = vtrn_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrns32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrns32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrns32 (void)
+-{
+- int32x2x2_t out_int32x2x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2x2_t = vtrn_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrns8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrns8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrns8 (void)
+-{
+- int8x8x2_t out_int8x8x2_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8x2_t = vtrn_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnu16 (void)
+-{
+- uint16x4x2_t out_uint16x4x2_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4x2_t = vtrn_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnu32 (void)
+-{
+- uint32x2x2_t out_uint32x2x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2x2_t = vtrn_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtrnu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtrnu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtrnu8 (void)
+-{
+- uint8x8x2_t out_uint8x8x2_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8x2_t = vtrn_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtrn\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstQp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstQp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstQp8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- poly8x16_t arg0_poly8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_uint8x16_t = vtstq_p8 (arg0_poly8x16_t, arg1_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstQs16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_uint16x8_t = vtstq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstQs32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_uint32x4_t = vtstq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstQs8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_uint8x16_t = vtstq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstQu16 (void)
+-{
+- uint16x8_t out_uint16x8_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8_t = vtstq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstQu32 (void)
+-{
+- uint32x4_t out_uint32x4_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4_t = vtstq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstQu8 (void)
+-{
+- uint8x16_t out_uint8x16_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16_t = vtstq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstp8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_uint8x8_t = vtst_p8 (arg0_poly8x8_t, arg1_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtsts16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtsts16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtsts16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_uint16x4_t = vtst_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtsts32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtsts32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtsts32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_uint32x2_t = vtst_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtsts8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtsts8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtsts8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_uint8x8_t = vtst_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstu16 (void)
+-{
+- uint16x4_t out_uint16x4_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4_t = vtst_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstu32 (void)
+-{
+- uint32x2_t out_uint32x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2_t = vtst_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vtstu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vtstu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vtstu8 (void)
+-{
+- uint8x8_t out_uint8x8_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8_t = vtst_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vtst\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpQf32 (void)
+-{
+- float32x4x2_t out_float32x4x2_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4x2_t = vuzpq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpQp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpQp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpQp16 (void)
+-{
+- poly16x8x2_t out_poly16x8x2_t;
+- poly16x8_t arg0_poly16x8_t;
+- poly16x8_t arg1_poly16x8_t;
+-
+- out_poly16x8x2_t = vuzpq_p16 (arg0_poly16x8_t, arg1_poly16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpQp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpQp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpQp8 (void)
+-{
+- poly8x16x2_t out_poly8x16x2_t;
+- poly8x16_t arg0_poly8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_poly8x16x2_t = vuzpq_p8 (arg0_poly8x16_t, arg1_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpQs16 (void)
+-{
+- int16x8x2_t out_int16x8x2_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8x2_t = vuzpq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpQs32 (void)
+-{
+- int32x4x2_t out_int32x4x2_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4x2_t = vuzpq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpQs8 (void)
+-{
+- int8x16x2_t out_int8x16x2_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16x2_t = vuzpq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpQu16 (void)
+-{
+- uint16x8x2_t out_uint16x8x2_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8x2_t = vuzpq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpQu32 (void)
+-{
+- uint32x4x2_t out_uint32x4x2_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4x2_t = vuzpq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpQu8 (void)
+-{
+- uint8x16x2_t out_uint8x16x2_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16x2_t = vuzpq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpf32 (void)
+-{
+- float32x2x2_t out_float32x2x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2x2_t = vuzp_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpp16 (void)
+-{
+- poly16x4x2_t out_poly16x4x2_t;
+- poly16x4_t arg0_poly16x4_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- out_poly16x4x2_t = vuzp_p16 (arg0_poly16x4_t, arg1_poly16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpp8 (void)
+-{
+- poly8x8x2_t out_poly8x8x2_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly8x8x2_t = vuzp_p8 (arg0_poly8x8_t, arg1_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzps16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzps16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzps16 (void)
+-{
+- int16x4x2_t out_int16x4x2_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4x2_t = vuzp_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzps32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzps32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzps32 (void)
+-{
+- int32x2x2_t out_int32x2x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2x2_t = vuzp_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzps8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzps8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzps8 (void)
+-{
+- int8x8x2_t out_int8x8x2_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8x2_t = vuzp_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpu16 (void)
+-{
+- uint16x4x2_t out_uint16x4x2_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4x2_t = vuzp_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpu32 (void)
+-{
+- uint32x2x2_t out_uint32x2x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2x2_t = vuzp_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vuzpu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vuzpu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vuzpu8 (void)
+-{
+- uint8x8x2_t out_uint8x8x2_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8x2_t = vuzp_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipQf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipQf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipQf32 (void)
+-{
+- float32x4x2_t out_float32x4x2_t;
+- float32x4_t arg0_float32x4_t;
+- float32x4_t arg1_float32x4_t;
+-
+- out_float32x4x2_t = vzipq_f32 (arg0_float32x4_t, arg1_float32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipQp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipQp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipQp16 (void)
+-{
+- poly16x8x2_t out_poly16x8x2_t;
+- poly16x8_t arg0_poly16x8_t;
+- poly16x8_t arg1_poly16x8_t;
+-
+- out_poly16x8x2_t = vzipq_p16 (arg0_poly16x8_t, arg1_poly16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipQp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipQp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipQp8 (void)
+-{
+- poly8x16x2_t out_poly8x16x2_t;
+- poly8x16_t arg0_poly8x16_t;
+- poly8x16_t arg1_poly8x16_t;
+-
+- out_poly8x16x2_t = vzipq_p8 (arg0_poly8x16_t, arg1_poly8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipQs16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipQs16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipQs16 (void)
+-{
+- int16x8x2_t out_int16x8x2_t;
+- int16x8_t arg0_int16x8_t;
+- int16x8_t arg1_int16x8_t;
+-
+- out_int16x8x2_t = vzipq_s16 (arg0_int16x8_t, arg1_int16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipQs32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipQs32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipQs32 (void)
+-{
+- int32x4x2_t out_int32x4x2_t;
+- int32x4_t arg0_int32x4_t;
+- int32x4_t arg1_int32x4_t;
+-
+- out_int32x4x2_t = vzipq_s32 (arg0_int32x4_t, arg1_int32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipQs8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipQs8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipQs8 (void)
+-{
+- int8x16x2_t out_int8x16x2_t;
+- int8x16_t arg0_int8x16_t;
+- int8x16_t arg1_int8x16_t;
+-
+- out_int8x16x2_t = vzipq_s8 (arg0_int8x16_t, arg1_int8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipQu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipQu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipQu16 (void)
+-{
+- uint16x8x2_t out_uint16x8x2_t;
+- uint16x8_t arg0_uint16x8_t;
+- uint16x8_t arg1_uint16x8_t;
+-
+- out_uint16x8x2_t = vzipq_u16 (arg0_uint16x8_t, arg1_uint16x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipQu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipQu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipQu32 (void)
+-{
+- uint32x4x2_t out_uint32x4x2_t;
+- uint32x4_t arg0_uint32x4_t;
+- uint32x4_t arg1_uint32x4_t;
+-
+- out_uint32x4x2_t = vzipq_u32 (arg0_uint32x4_t, arg1_uint32x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipQu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipQu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipQu8 (void)
+-{
+- uint8x16x2_t out_uint8x16x2_t;
+- uint8x16_t arg0_uint8x16_t;
+- uint8x16_t arg1_uint8x16_t;
+-
+- out_uint8x16x2_t = vzipq_u8 (arg0_uint8x16_t, arg1_uint8x16_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipf32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipf32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipf32 (void)
+-{
+- float32x2x2_t out_float32x2x2_t;
+- float32x2_t arg0_float32x2_t;
+- float32x2_t arg1_float32x2_t;
+-
+- out_float32x2x2_t = vzip_f32 (arg0_float32x2_t, arg1_float32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipp16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipp16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipp16 (void)
+-{
+- poly16x4x2_t out_poly16x4x2_t;
+- poly16x4_t arg0_poly16x4_t;
+- poly16x4_t arg1_poly16x4_t;
+-
+- out_poly16x4x2_t = vzip_p16 (arg0_poly16x4_t, arg1_poly16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipp8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipp8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipp8 (void)
+-{
+- poly8x8x2_t out_poly8x8x2_t;
+- poly8x8_t arg0_poly8x8_t;
+- poly8x8_t arg1_poly8x8_t;
+-
+- out_poly8x8x2_t = vzip_p8 (arg0_poly8x8_t, arg1_poly8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzips16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzips16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzips16 (void)
+-{
+- int16x4x2_t out_int16x4x2_t;
+- int16x4_t arg0_int16x4_t;
+- int16x4_t arg1_int16x4_t;
+-
+- out_int16x4x2_t = vzip_s16 (arg0_int16x4_t, arg1_int16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzips32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzips32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzips32 (void)
+-{
+- int32x2x2_t out_int32x2x2_t;
+- int32x2_t arg0_int32x2_t;
+- int32x2_t arg1_int32x2_t;
+-
+- out_int32x2x2_t = vzip_s32 (arg0_int32x2_t, arg1_int32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzips8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzips8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzips8 (void)
+-{
+- int8x8x2_t out_int8x8x2_t;
+- int8x8_t arg0_int8x8_t;
+- int8x8_t arg1_int8x8_t;
+-
+- out_int8x8x2_t = vzip_s8 (arg0_int8x8_t, arg1_int8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipu16.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipu16' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipu16 (void)
+-{
+- uint16x4x2_t out_uint16x4x2_t;
+- uint16x4_t arg0_uint16x4_t;
+- uint16x4_t arg1_uint16x4_t;
+-
+- out_uint16x4x2_t = vzip_u16 (arg0_uint16x4_t, arg1_uint16x4_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipu32.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipu32' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipu32 (void)
+-{
+- uint32x2x2_t out_uint32x2x2_t;
+- uint32x2_t arg0_uint32x2_t;
+- uint32x2_t arg1_uint32x2_t;
+-
+- out_uint32x2x2_t = vzip_u32 (arg0_uint32x2_t, arg1_uint32x2_t);
+-}
+-
+-/* { dg-final { scan-assembler "vuzp\.32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/neon/vzipu8.c
++++ b/src//dev/null
+@@ -1,20 +0,0 @@
+-/* Test the `vzipu8' ARM Neon intrinsic. */
+-/* This file was autogenerated by neon-testgen. */
+-
+-/* { dg-do assemble } */
+-/* { dg-require-effective-target arm_neon_ok } */
+-/* { dg-options "-save-temps -O0" } */
+-/* { dg-add-options arm_neon } */
+-
+-#include "arm_neon.h"
+-
+-void test_vzipu8 (void)
+-{
+- uint8x8x2_t out_uint8x8x2_t;
+- uint8x8_t arg0_uint8x8_t;
+- uint8x8_t arg1_uint8x8_t;
+-
+- out_uint8x8x2_t = vzip_u8 (arg0_uint8x8_t, arg1_uint8x8_t);
+-}
+-
+-/* { dg-final { scan-assembler "vzip\.8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n" } } */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/pr37780_1.c
+@@ -0,0 +1,48 @@
++/* Test that we can remove the conditional move due to CLZ
++ being defined at zero. */
++
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_arch_v6t2_ok } */
++/* { dg-options "-O2" } */
++/* { dg-add-options arm_arch_v6t2 } */
+
-+/* { dg-final { scan-assembler-times "fmul\tv\[0-9\]+\.2s, v\[0-9\]+\.2s, v\[0-9\]+\.s\\\[0\\\]" 2 } } */
++int
++fooctz (int i)
++{
++ return (i == 0) ? 32 : __builtin_ctz (i);
+}
+
-+void
-+check_v4sf (float32_t elemA, float32_t elemB, float32_t elemC, float32_t elemD)
++int
++fooctz2 (int i)
+{
-+ int32_t indx;
-+ const float32_t vec32x4_buf[4] = {A, B, C, D};
-+ float32x4_t vec32x4_src = vld1q_f32 (vec32x4_buf);
-+ float32_t vec32x4_res[4];
-+
-+ vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemA));
-+
-+ for (indx = 0; indx < 4; indx++)
-+ if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_1[indx])
-+ abort ();
-+
-+ vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemB));
-+
-+ for (indx = 0; indx < 4; indx++)
-+ if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_2[indx])
-+ abort ();
-+
-+ vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemC));
-+
-+ for (indx = 0; indx < 4; indx++)
-+ if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_3[indx])
-+ abort ();
-+
-+ vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemD));
-+
-+ for (indx = 0; indx < 4; indx++)
-+ if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_4[indx])
-+ abort ();
-+
-+/* { dg-final { scan-assembler-times "fmul\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.s\\\[0\\\]" 4 } } */
++ return (i != 0) ? __builtin_ctz (i) : 32;
+}
+
-+void
-+check_v2df (float64_t elemdC, float64_t elemdD)
++unsigned int
++fooctz3 (unsigned int i)
+{
-+ int32_t indx;
-+ const float64_t vec64x2_buf[2] = {AD, BD};
-+ float64x2_t vec64x2_src = vld1q_f64 (vec64x2_buf);
-+ float64_t vec64x2_res[2];
-+
-+ vst1q_f64 (vec64x2_res, vmulq_n_f64 (vec64x2_src, elemdC));
-+
-+ for (indx = 0; indx < 2; indx++)
-+ if (* (uint64_t *) &vec64x2_res[indx] != * (uint64_t *) &expectedd2_1[indx])
-+ abort ();
-+
-+ vst1q_f64 (vec64x2_res, vmulq_n_f64 (vec64x2_src, elemdD));
++ return (i > 0) ? __builtin_ctz (i) : 32;
++}
+
-+ for (indx = 0; indx < 2; indx++)
-+ if (* (uint64_t *) &vec64x2_res[indx] != * (uint64_t *) &expectedd2_2[indx])
-+ abort ();
++/* { dg-final { scan-assembler-times "rbit\t*" 3 } } */
+
-+/* { dg-final { scan-assembler-times "fmul\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.d\\\[0\\\]" 2 } } */
++int
++fooclz (int i)
++{
++ return (i == 0) ? 32 : __builtin_clz (i);
+}
+
-+void
-+check_v2si (int32_t elemsA, int32_t elemsB)
++int
++fooclz2 (int i)
+{
-+ int32_t indx;
-+ const int32_t vecs32x2_buf[2] = {AS, BS};
-+ int32x2_t vecs32x2_src = vld1_s32 (vecs32x2_buf);
-+ int32_t vecs32x2_res[2];
-+
-+ vst1_s32 (vecs32x2_res, vmul_n_s32 (vecs32x2_src, elemsA));
-+
-+ for (indx = 0; indx < 2; indx++)
-+ if (vecs32x2_res[indx] != expecteds2_1[indx])
-+ abort ();
-+
-+ vst1_s32 (vecs32x2_res, vmul_n_s32 (vecs32x2_src, elemsB));
-+
-+ for (indx = 0; indx < 2; indx++)
-+ if (vecs32x2_res[indx] != expecteds2_2[indx])
-+ abort ();
++ return (i != 0) ? __builtin_clz (i) : 32;
+}
+
-+void
-+check_v2si_unsigned (uint32_t elemusA, uint32_t elemusB)
++unsigned int
++fooclz3 (unsigned int i)
+{
-+ int indx;
-+ const uint32_t vecus32x2_buf[2] = {AUS, BUS};
-+ uint32x2_t vecus32x2_src = vld1_u32 (vecus32x2_buf);
-+ uint32_t vecus32x2_res[2];
-+
-+ vst1_u32 (vecus32x2_res, vmul_n_u32 (vecus32x2_src, elemusA));
-+
-+ for (indx = 0; indx < 2; indx++)
-+ if (vecus32x2_res[indx] != expectedus2_1[indx])
-+ abort ();
-+
-+ vst1_u32 (vecus32x2_res, vmul_n_u32 (vecus32x2_src, elemusB));
++ return (i > 0) ? __builtin_clz (i) : 32;
++}
+
-+ for (indx = 0; indx < 2; indx++)
-+ if (vecus32x2_res[indx] != expectedus2_2[indx])
-+ abort ();
++/* { dg-final { scan-assembler-times "clz\t" 6 } } */
++/* { dg-final { scan-assembler-not "cmp\t.*0" } } */
+--- a/src/gcc/testsuite/gcc.target/arm/pr42574.c
++++ b/src/gcc/testsuite/gcc.target/arm/pr42574.c
+@@ -1,5 +1,5 @@
++/* { dg-do compile { target { arm_thumb1_ok && { ! arm_thumb1_movt_ok } } } } */
+ /* { dg-options "-mthumb -Os -fpic" } */
+-/* { dg-require-effective-target arm_thumb1_ok } */
+ /* { dg-require-effective-target fpic } */
+ /* Make sure the address of glob.c is calculated only once and using
+ a logical shift for the offset (200<<1). */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/short-vfp-1.c
+@@ -0,0 +1,45 @@
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_vfp_ok }
++/* { dg-options "-mfpu=vfp" } */
+
-+/* { dg-final { scan-assembler-times "\tmul\tv\[0-9\]+\.2s, v\[0-9\]+\.2s, v\[0-9\]+\.s\\\[0\\\]" 4 } } */
++int
++test_sisf (float x)
++{
++ return (int)x;
+}
+
-+void
-+check_v4si (int32_t elemsA, int32_t elemsB, int32_t elemsC, int32_t elemsD)
++short
++test_hisf (float x)
+{
-+ int32_t indx;
-+ const int32_t vecs32x4_buf[4] = {AS, BS, CS, DS};
-+ int32x4_t vecs32x4_src = vld1q_s32 (vecs32x4_buf);
-+ int32_t vecs32x4_res[4];
-+
-+ vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsA));
-+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecs32x4_res[indx] != expecteds4_1[indx])
-+ abort ();
-+
-+ vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsB));
-+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecs32x4_res[indx] != expecteds4_2[indx])
-+ abort ();
-+
-+ vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsC));
-+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecs32x4_res[indx] != expecteds4_3[indx])
-+ abort ();
++ return (short)x;
++}
+
-+ vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsD));
++float
++test_sfsi (int x)
++{
++ return (float)x;
++}
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecs32x4_res[indx] != expecteds4_4[indx])
-+ abort ();
++float
++test_sfhi (short x)
++{
++ return (float)x;
+}
+
-+void
-+check_v4si_unsigned (uint32_t elemusA, uint32_t elemusB, uint32_t elemusC,
-+ uint32_t elemusD)
++short
++test_hisi (int x)
+{
-+ int indx;
-+ const uint32_t vecus32x4_buf[4] = {AUS, BUS, CUS, DUS};
-+ uint32x4_t vecus32x4_src = vld1q_u32 (vecus32x4_buf);
-+ uint32_t vecus32x4_res[4];
++ return (short)x;
++}
+
-+ vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusA));
++int
++test_sihi (short x)
++{
++ return (int)x;
++}
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecus32x4_res[indx] != expectedus4_1[indx])
-+ abort ();
++/* {dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+,s[0-9]+} 2 }} */
++/* {dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+,s[0-9]+} 2 }} */
++/* {dg-final { scan-assembler-times {vmov\tr[0-9]+,s[0-9]+} 2 }} */
++/* {dg-final { scan-assembler-times {vmov\ts[0-9]+,r[0-9]+} 2 }} */
++/* {dg-final { scan-assembler-times {sxth\tr[0-9]+,r[0-9]+} 2 }} */
+--- /dev/null
++++ b/src/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c
+@@ -0,0 +1,25 @@
++/* Test the `vst1Q_laneu64' ARM Neon intrinsic. */
+
-+ vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusB));
++/* Detect ICE in the case of unaligned memory address. */
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecus32x4_res[indx] != expectedus4_2[indx])
-+ abort ();
++/* { dg-do compile } */
++/* { dg-require-effective-target arm_neon_ok } */
++/* { dg-add-options arm_neon } */
+
-+ vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusC));
++#include "arm_neon.h"
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecus32x4_res[indx] != expectedus4_3[indx])
-+ abort ();
++unsigned char dummy_store[1000];
+
-+ vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusD));
++void
++foo (unsigned char* addr)
++{
++ uint8x16_t vdata = vld1q_u8 (addr);
++ vst1q_lane_u64 ((uint64_t*) &dummy_store, vreinterpretq_u64_u8 (vdata), 0);
++}
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecus32x4_res[indx] != expectedus4_4[indx])
-+ abort ();
++uint64_t
++bar (uint64x2_t vdata)
++{
++ vdata = vld1q_lane_u64 ((uint64_t*) &dummy_store, vdata, 0);
++ return vgetq_lane_u64 (vdata, 0);
++}
+--- a/src/gcc/testsuite/lib/gcc-dg.exp
++++ b/src/gcc/testsuite/lib/gcc-dg.exp
+@@ -403,6 +403,7 @@ if { [info procs ${tool}_load] != [list] \
+ switch [lindex $result 0] {
+ "pass" { set status "fail" }
+ "fail" { set status "pass" }
++ default { set status [lindex $result 0] }
+ }
+ set result [list $status [lindex $result 1]]
+ }
+--- a/src/gcc/testsuite/lib/target-supports.exp
++++ b/src/gcc/testsuite/lib/target-supports.exp
+@@ -2938,6 +2938,28 @@ proc add_options_for_arm_v8_1a_neon { flags } {
+ return "$flags $et_arm_v8_1a_neon_flags -march=armv8.1-a"
+ }
+
++# Add the options needed for ARMv8.2 with the scalar FP16 extension.
++# Also adds the ARMv8 FP options for ARM and for AArch64.
+
-+/* { dg-final { scan-assembler-times "\tmul\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.s\\\[0\\\]" 8 } } */
++proc add_options_for_arm_v8_2a_fp16_scalar { flags } {
++ if { ! [check_effective_target_arm_v8_2a_fp16_scalar_ok] } {
++ return "$flags"
++ }
++ global et_arm_v8_2a_fp16_scalar_flags
++ return "$flags $et_arm_v8_2a_fp16_scalar_flags"
+}
+
++# Add the options needed for ARMv8.2 with the FP16 extension. Also adds
++# the ARMv8 NEON options for ARM and for AArch64.
+
-+void
-+check_v4hi (int16_t elemhA, int16_t elemhB, int16_t elemhC, int16_t elemhD)
-+{
-+ int32_t indx;
-+ const int16_t vech16x4_buf[4] = {AH, BH, CH, DH};
-+ int16x4_t vech16x4_src = vld1_s16 (vech16x4_buf);
-+ int16_t vech16x4_res[4];
++proc add_options_for_arm_v8_2a_fp16_neon { flags } {
++ if { ! [check_effective_target_arm_v8_2a_fp16_neon_ok] } {
++ return "$flags"
++ }
++ global et_arm_v8_2a_fp16_neon_flags
++ return "$flags $et_arm_v8_2a_fp16_neon_flags"
++}
+
-+ vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhA));
+ proc add_options_for_arm_crc { flags } {
+ if { ! [check_effective_target_arm_crc_ok] } {
+ return "$flags"
+@@ -3024,23 +3046,25 @@ proc check_effective_target_arm_crc_ok { } {
+
+ proc check_effective_target_arm_neon_fp16_ok_nocache { } {
+ global et_arm_neon_fp16_flags
++ global et_arm_neon_flags
+ set et_arm_neon_fp16_flags ""
+- if { [check_effective_target_arm32] } {
++ if { [check_effective_target_arm32]
++ && [check_effective_target_arm_neon_ok] } {
+ foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp16"
+ "-mfpu=neon-fp16 -mfloat-abi=softfp"
+ "-mfp16-format=ieee"
+ "-mfloat-abi=softfp -mfp16-format=ieee"
+ "-mfpu=neon-fp16 -mfp16-format=ieee"
+ "-mfpu=neon-fp16 -mfloat-abi=softfp -mfp16-format=ieee"} {
+- if { [check_no_compiler_messages_nocache arm_neon_fp_16_ok object {
++ if { [check_no_compiler_messages_nocache arm_neon_fp16_ok object {
+ #include "arm_neon.h"
+ float16x4_t
+ foo (float32x4_t arg)
+ {
+ return vcvt_f16_f32 (arg);
+ }
+- } "$flags"] } {
+- set et_arm_neon_fp16_flags $flags
++ } "$et_arm_neon_flags $flags"] } {
++ set et_arm_neon_fp16_flags [concat $et_arm_neon_flags $flags]
+ return 1
+ }
+ }
+@@ -3077,6 +3101,65 @@ proc add_options_for_arm_neon_fp16 { flags } {
+ return "$flags $et_arm_neon_fp16_flags"
+ }
+
++# Return 1 if this is an ARM target supporting the FP16 alternative
++# format. Some multilibs may be incompatible with the options needed. Also
++# set et_arm_neon_fp16_flags to the best options to add.
++
++proc check_effective_target_arm_fp16_alternative_ok_nocache { } {
++ global et_arm_neon_fp16_flags
++ set et_arm_neon_fp16_flags ""
++ if { [check_effective_target_arm32] } {
++ foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp16"
++ "-mfpu=neon-fp16 -mfloat-abi=softfp"} {
++ if { [check_no_compiler_messages_nocache \
++ arm_fp16_alternative_ok object {
++ #if !defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ #error __ARM_FP16_FORMAT_ALTERNATIVE not defined
++ #endif
++ } "$flags -mfp16-format=alternative"] } {
++ set et_arm_neon_fp16_flags "$flags -mfp16-format=alternative"
++ return 1
++ }
++ }
++ }
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vech16x4_res[indx] != expectedh4_1[indx])
-+ abort ();
++ return 0
++}
+
-+ vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhB));
++proc check_effective_target_arm_fp16_alternative_ok { } {
++ return [check_cached_effective_target arm_fp16_alternative_ok \
++ check_effective_target_arm_fp16_alternative_ok_nocache]
++}
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vech16x4_res[indx] != expectedh4_2[indx])
-+ abort ();
++# Return 1 if this is an ARM target supports specifying the FP16 none
++# format. Some multilibs may be incompatible with the options needed.
++
++proc check_effective_target_arm_fp16_none_ok_nocache { } {
++ if { [check_effective_target_arm32] } {
++ foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp16"
++ "-mfpu=neon-fp16 -mfloat-abi=softfp"} {
++ if { [check_no_compiler_messages_nocache \
++ arm_fp16_none_ok object {
++ #if defined (__ARM_FP16_FORMAT_ALTERNATIVE)
++ #error __ARM_FP16_FORMAT_ALTERNATIVE defined
++ #endif
++ #if defined (__ARM_FP16_FORMAT_IEEE)
++ #error __ARM_FP16_FORMAT_IEEE defined
++ #endif
++ } "$flags -mfp16-format=none"] } {
++ return 1
++ }
++ }
++ }
+
-+ vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhC));
++ return 0
++}
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vech16x4_res[indx] != expectedh4_3[indx])
-+ abort ();
++proc check_effective_target_arm_fp16_none_ok { } {
++ return [check_cached_effective_target arm_fp16_none_ok \
++ check_effective_target_arm_fp16_none_ok_nocache]
++}
+
-+ vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhD));
+ # Return 1 if this is an ARM target supporting -mfpu=neon-fp-armv8
+ # -mfloat-abi=softfp or equivalent options. Some multilibs may be
+ # incompatible with these options. Also set et_arm_v8_neon_flags to the
+@@ -3119,8 +3202,10 @@ proc check_effective_target_arm_v8_neon_ok { } {
+
+ proc check_effective_target_arm_neonv2_ok_nocache { } {
+ global et_arm_neonv2_flags
++ global et_arm_neon_flags
+ set et_arm_neonv2_flags ""
+- if { [check_effective_target_arm32] } {
++ if { [check_effective_target_arm32]
++ && [check_effective_target_arm_neon_ok] } {
+ foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-vfpv4" "-mfpu=neon-vfpv4 -mfloat-abi=softfp"} {
+ if { [check_no_compiler_messages_nocache arm_neonv2_ok object {
+ #include "arm_neon.h"
+@@ -3129,8 +3214,8 @@ proc check_effective_target_arm_neonv2_ok_nocache { } {
+ {
+ return vfma_f32 (a, b, c);
+ }
+- } "$flags"] } {
+- set et_arm_neonv2_flags $flags
++ } "$et_arm_neon_flags $flags"] } {
++ set et_arm_neonv2_flags [concat $et_arm_neon_flags $flags]
+ return 1
+ }
+ }
+@@ -3144,9 +3229,9 @@ proc check_effective_target_arm_neonv2_ok { } {
+ check_effective_target_arm_neonv2_ok_nocache]
+ }
+
+-# Add the options needed for NEON. We need either -mfloat-abi=softfp
+-# or -mfloat-abi=hard, but if one is already specified by the
+-# multilib, use it.
++# Add the options needed for VFP FP16 support. We need either
++# -mfloat-abi=softfp or -mfloat-abi=hard. If one is already specified by
++# the multilib, use it.
+
+ proc add_options_for_arm_fp16 { flags } {
+ if { ! [check_effective_target_arm_fp16_ok] } {
+@@ -3156,9 +3241,32 @@ proc add_options_for_arm_fp16 { flags } {
+ return "$flags $et_arm_fp16_flags"
+ }
+
++# Add the options needed to enable support for IEEE format
++# half-precision support. This is valid for ARM targets.
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vech16x4_res[indx] != expectedh4_4[indx])
-+ abort ();
++proc add_options_for_arm_fp16_ieee { flags } {
++ if { ! [check_effective_target_arm_fp16_ok] } {
++ return "$flags"
++ }
++ global et_arm_fp16_flags
++ return "$flags $et_arm_fp16_flags -mfp16-format=ieee"
+}
+
-+void
-+check_v4hi_unsigned (uint16_t elemuhA, uint16_t elemuhB, uint16_t elemuhC,
-+ uint16_t elemuhD)
-+{
-+ int indx;
-+ const uint16_t vecuh16x4_buf[4] = {AUH, BUH, CUH, DUH};
-+ uint16x4_t vecuh16x4_src = vld1_u16 (vecuh16x4_buf);
-+ uint16_t vecuh16x4_res[4];
++# Add the options needed to enable support for ARM Alternative format
++# half-precision support. This is valid for ARM targets.
+
-+ vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhA));
++proc add_options_for_arm_fp16_alternative { flags } {
++ if { ! [check_effective_target_arm_fp16_ok] } {
++ return "$flags"
++ }
++ global et_arm_fp16_flags
++ return "$flags $et_arm_fp16_flags -mfp16-format=alternative"
++}
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecuh16x4_res[indx] != expecteduh4_1[indx])
-+ abort ();
+ # Return 1 if this is an ARM target that can support a VFP fp16 variant.
+ # Skip multilibs that are incompatible with these options and set
+-# et_arm_fp16_flags to the best options to add.
++# et_arm_fp16_flags to the best options to add. This test is valid for
++# ARM only.
+
+ proc check_effective_target_arm_fp16_ok_nocache { } {
+ global et_arm_fp16_flags
+@@ -3166,7 +3274,10 @@ proc check_effective_target_arm_fp16_ok_nocache { } {
+ if { ! [check_effective_target_arm32] } {
+ return 0;
+ }
+- if [check-flags [list "" { *-*-* } { "-mfpu=*" } { "-mfpu=*fp16*" "-mfpu=*fpv[4-9]*" "-mfpu=*fpv[1-9][0-9]*" } ]] {
++ if [check-flags \
++ [list "" { *-*-* } { "-mfpu=*" } \
++ { "-mfpu=*fp16*" "-mfpu=*fpv[4-9]*" \
++ "-mfpu=*fpv[1-9][0-9]*" "-mfpu=*fp-armv8*" } ]] {
+ # Multilib flags would override -mfpu.
+ return 0
+ }
+@@ -3202,6 +3313,28 @@ proc check_effective_target_arm_fp16_ok { } {
+ check_effective_target_arm_fp16_ok_nocache]
+ }
+
++# Return 1 if the target supports executing VFP FP16 instructions, 0
++# otherwise. This test is valid for ARM only.
+
-+ vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhB));
++proc check_effective_target_arm_fp16_hw { } {
++ if {! [check_effective_target_arm_fp16_ok] } {
++ return 0
++ }
++ global et_arm_fp16_flags
++ check_runtime_nocache arm_fp16_hw {
++ int
++ main (int argc, char **argv)
++ {
++ __fp16 a = 1.0;
++ float r;
++ asm ("vcvtb.f32.f16 %0, %1"
++ : "=w" (r) : "w" (a)
++ : /* No clobbers. */);
++ return (r == 1.0) ? 0 : 1;
++ }
++ } "$et_arm_fp16_flags -mfp16-format=ieee"
++}
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecuh16x4_res[indx] != expecteduh4_2[indx])
-+ abort ();
+ # Creates a series of routines that return 1 if the given architecture
+ # can be selected and a routine to give the flags to select that architecture
+ # Note: Extra flags may be added to disable options from newer compilers
+@@ -3226,7 +3359,10 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
+ v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
+ v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
+ v8a "-march=armv8-a" __ARM_ARCH_8A__
+- v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
++ v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
++ v8_2a "-march=armv8.2a" __ARM_ARCH_8A__
++ v8m_base "-march=armv8-m.base -mthumb" __ARM_ARCH_8M_BASE__
++ v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } {
+ eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
+ proc check_effective_target_arm_arch_FUNC_ok { } {
+ if { [ string match "*-marm*" "FLAG" ] &&
+@@ -3354,15 +3490,47 @@ proc check_effective_target_arm_cortex_m { } {
+ return 0
+ }
+ return [check_no_compiler_messages arm_cortex_m assembly {
+- #if !defined(__ARM_ARCH_7M__) \
+- && !defined (__ARM_ARCH_7EM__) \
+- && !defined (__ARM_ARCH_6M__)
+- #error !__ARM_ARCH_7M__ && !__ARM_ARCH_7EM__ && !__ARM_ARCH_6M__
++ #if defined(__ARM_ARCH_ISA_ARM)
++ #error __ARM_ARCH_ISA_ARM is defined
+ #endif
+ int i;
+ } "-mthumb"]
+ }
+
++# Return 1 if this is an ARM target where -mthumb causes Thumb-1 to be
++# used and MOVT/MOVW instructions to be available.
++
++proc check_effective_target_arm_thumb1_movt_ok {} {
++ if [check_effective_target_arm_thumb1_ok] {
++ return [check_no_compiler_messages arm_movt object {
++ int
++ foo (void)
++ {
++ asm ("movt r0, #42");
++ }
++ } "-mthumb"]
++ } else {
++ return 0
++ }
++}
+
-+ vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhC));
++# Return 1 if this is an ARM target where -mthumb causes Thumb-1 to be
++# used and CBZ and CBNZ instructions are available.
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecuh16x4_res[indx] != expecteduh4_3[indx])
-+ abort ();
++proc check_effective_target_arm_thumb1_cbz_ok {} {
++ if [check_effective_target_arm_thumb1_ok] {
++ return [check_no_compiler_messages arm_movt object {
++ int
++ foo (void)
++ {
++ asm ("cbz r0, 2f\n2:");
++ }
++ } "-mthumb"]
++ } else {
++ return 0
++ }
++}
+
-+ vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhD));
+ # Return 1 if this compilation turns on string_ops_prefer_neon on.
+
+ proc check_effective_target_arm_tune_string_ops_prefer_neon { } {
+@@ -3438,6 +3606,76 @@ proc check_effective_target_arm_v8_1a_neon_ok { } {
+ check_effective_target_arm_v8_1a_neon_ok_nocache]
+ }
+
++# Return 1 if the target supports ARMv8.2 scalar FP16 arithmetic
++# instructions, 0 otherwise. The test is valid for ARM and for AArch64.
++# Record the command line options needed.
+
-+ for (indx = 0; indx < 4; indx++)
-+ if (vecuh16x4_res[indx] != expecteduh4_4[indx])
-+ abort ();
++proc check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache { } {
++ global et_arm_v8_2a_fp16_scalar_flags
++ set et_arm_v8_2a_fp16_scalar_flags ""
+
-+/* { dg-final { scan-assembler-times "mul\tv\[0-9\]+\.4h, v\[0-9\]+\.4h, v\[0-9\]+\.h\\\[0\\\]" 8 } } */
-+}
++ if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
++ return 0;
++ }
+
-+void
-+check_v8hi (int16_t elemhA, int16_t elemhB, int16_t elemhC, int16_t elemhD,
-+ int16_t elemhE, int16_t elemhF, int16_t elemhG, int16_t elemhH)
-+{
-+ int32_t indx;
-+ const int16_t vech16x8_buf[8] = {AH, BH, CH, DH, EH, FH, GH, HH};
-+ int16x8_t vech16x8_src = vld1q_s16 (vech16x8_buf);
-+ int16_t vech16x8_res[8];
++ # Iterate through sets of options to find the compiler flags that
++ # need to be added to the -march option.
++ foreach flags {"" "-mfpu=fp-armv8" "-mfloat-abi=softfp" \
++ "-mfpu=fp-armv8 -mfloat-abi=softfp"} {
++ if { [check_no_compiler_messages_nocache \
++ arm_v8_2a_fp16_scalar_ok object {
++ #if !defined (__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
++ #error "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC not defined"
++ #endif
++ } "$flags -march=armv8.2-a+fp16"] } {
++ set et_arm_v8_2a_fp16_scalar_flags "$flags -march=armv8.2-a+fp16"
++ return 1
++ }
++ }
+
-+ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhA));
++ return 0;
++}
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vech16x8_res[indx] != expectedh8_1[indx])
-+ abort ();
++proc check_effective_target_arm_v8_2a_fp16_scalar_ok { } {
++ return [check_cached_effective_target arm_v8_2a_fp16_scalar_ok \
++ check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache]
++}
+
-+ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhB));
++# Return 1 if the target supports ARMv8.2 Adv.SIMD FP16 arithmetic
++# instructions, 0 otherwise. The test is valid for ARM and for AArch64.
++# Record the command line options needed.
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vech16x8_res[indx] != expectedh8_2[indx])
-+ abort ();
++proc check_effective_target_arm_v8_2a_fp16_neon_ok_nocache { } {
++ global et_arm_v8_2a_fp16_neon_flags
++ set et_arm_v8_2a_fp16_neon_flags ""
+
-+ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhC));
++ if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
++ return 0;
++ }
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vech16x8_res[indx] != expectedh8_3[indx])
-+ abort ();
++ # Iterate through sets of options to find the compiler flags that
++ # need to be added to the -march option.
++ foreach flags {"" "-mfpu=neon-fp-armv8" "-mfloat-abi=softfp" \
++ "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
++ if { [check_no_compiler_messages_nocache \
++ arm_v8_2a_fp16_neon_ok object {
++ #if !defined (__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)
++ #error "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC not defined"
++ #endif
++ } "$flags -march=armv8.2-a+fp16"] } {
++ set et_arm_v8_2a_fp16_neon_flags "$flags -march=armv8.2-a+fp16"
++ return 1
++ }
++ }
+
-+ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhD));
++ return 0;
++}
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vech16x8_res[indx] != expectedh8_4[indx])
-+ abort ();
++proc check_effective_target_arm_v8_2a_fp16_neon_ok { } {
++ return [check_cached_effective_target arm_v8_2a_fp16_neon_ok \
++ check_effective_target_arm_v8_2a_fp16_neon_ok_nocache]
++}
+
-+ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhE));
+ # Return 1 if the target supports executing ARMv8 NEON instructions, 0
+ # otherwise.
+
+@@ -3447,11 +3685,17 @@ proc check_effective_target_arm_v8_neon_hw { } {
+ int
+ main (void)
+ {
+- float32x2_t a;
++ float32x2_t a = { 1.0f, 2.0f };
++ #ifdef __ARM_ARCH_ISA_A64
++ asm ("frinta %0.2s, %1.2s"
++ : "=w" (a)
++ : "w" (a));
++ #else
+ asm ("vrinta.f32 %P0, %P1"
+ : "=w" (a)
+ : "0" (a));
+- return 0;
++ #endif
++ return a[0] == 2.0f;
+ }
+ } [add_options_for_arm_v8_neon ""]]
+ }
+@@ -3494,6 +3738,81 @@ proc check_effective_target_arm_v8_1a_neon_hw { } {
+ } [add_options_for_arm_v8_1a_neon ""]]
+ }
+
++# Return 1 if the target supports executing floating point instructions from
++# ARMv8.2 with the FP16 extension, 0 otherwise. The test is valid for ARM and
++# for AArch64.
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vech16x8_res[indx] != expectedh8_5[indx])
-+ abort ();
++proc check_effective_target_arm_v8_2a_fp16_scalar_hw { } {
++ if { ![check_effective_target_arm_v8_2a_fp16_scalar_ok] } {
++ return 0;
++ }
++ return [check_runtime arm_v8_2a_fp16_scalar_hw_available {
++ int
++ main (void)
++ {
++ __fp16 a = 1.0;
++ __fp16 result;
+
-+ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhF));
++ #ifdef __ARM_ARCH_ISA_A64
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vech16x8_res[indx] != expectedh8_6[indx])
-+ abort ();
++ asm ("fabs %h0, %h1"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers. */);
+
-+ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhG));
++ #else
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vech16x8_res[indx] != expectedh8_7[indx])
-+ abort ();
++ asm ("vabs.f16 %0, %1"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers. */);
+
-+ vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhH));
++ #endif
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vech16x8_res[indx] != expectedh8_8[indx])
-+ abort ();
++ return (result == 1.0) ? 0 : 1;
++ }
++ } [add_options_for_arm_v8_2a_fp16_scalar ""]]
+}
+
-+void
-+check_v8hi_unsigned (uint16_t elemuhA, uint16_t elemuhB, uint16_t elemuhC,
-+ uint16_t elemuhD, uint16_t elemuhE, uint16_t elemuhF,
-+ uint16_t elemuhG, uint16_t elemuhH)
-+{
-+ int indx;
-+ const uint16_t vecuh16x8_buf[8] = {AUH, BUH, CUH, DUH, EUH, FUH, GUH, HUH};
-+ uint16x8_t vecuh16x8_src = vld1q_u16 (vecuh16x8_buf);
-+ uint16_t vecuh16x8_res[8];
-+
-+ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhA));
++# Return 1 if the target supports executing Adv.SIMD instructions from ARMv8.2
++# with the FP16 extension, 0 otherwise. The test is valid for ARM and for
++# AArch64.
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vecuh16x8_res[indx] != expecteduh8_1[indx])
-+ abort ();
++proc check_effective_target_arm_v8_2a_fp16_neon_hw { } {
++ if { ![check_effective_target_arm_v8_2a_fp16_neon_ok] } {
++ return 0;
++ }
++ return [check_runtime arm_v8_2a_fp16_neon_hw_available {
++ int
++ main (void)
++ {
++ #ifdef __ARM_ARCH_ISA_A64
+
-+ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhB));
++ __Float16x4_t a = {1.0, -1.0, 1.0, -1.0};
++ __Float16x4_t result;
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vecuh16x8_res[indx] != expecteduh8_2[indx])
-+ abort ();
++ asm ("fabs %0.4h, %1.4h"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers. */);
+
-+ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhC));
++ #else
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vecuh16x8_res[indx] != expecteduh8_3[indx])
-+ abort ();
++ __simd64_float16_t a = {1.0, -1.0, 1.0, -1.0};
++ __simd64_float16_t result;
+
-+ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhD));
++ asm ("vabs.f16 %P0, %P1"
++ : "=w"(result)
++ : "w"(a)
++ : /* No clobbers. */);
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vecuh16x8_res[indx] != expecteduh8_4[indx])
-+ abort ();
++ #endif
+
-+ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhE));
++ return (result[0] == 1.0) ? 0 : 1;
++ }
++ } [add_options_for_arm_v8_2a_fp16_neon ""]]
++}
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vecuh16x8_res[indx] != expecteduh8_5[indx])
-+ abort ();
+ # Return 1 if this is a ARM target with NEON enabled.
+
+ proc check_effective_target_arm_neon { } {
+@@ -3528,6 +3847,25 @@ proc check_effective_target_arm_neonv2 { } {
+ }
+ }
+
++# Return 1 if this is an ARM target with load acquire and store release
++# instructions for 8-, 16- and 32-bit types.
+
-+ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhF));
++proc check_effective_target_arm_acq_rel { } {
++ return [check_no_compiler_messages arm_acq_rel object {
++ void
++ load_acquire_store_release (void)
++ {
++ asm ("lda r0, [r1]\n\t"
++ "stl r0, [r1]\n\t"
++ "ldah r0, [r1]\n\t"
++ "stlh r0, [r1]\n\t"
++ "ldab r0, [r1]\n\t"
++ "stlb r0, [r1]"
++ : : : "r0", "memory");
++ }
++ }]
++}
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vecuh16x8_res[indx] != expecteduh8_6[indx])
-+ abort ();
+ # Return 1 if this a Loongson-2E or -2F target using an ABI that supports
+ # the Loongson vector modes.
+
+@@ -4382,6 +4720,8 @@ proc check_effective_target_vect_widen_sum_hi_to_si_pattern { } {
+ set et_vect_widen_sum_hi_to_si_pattern_saved 0
+ if { [istarget powerpc*-*-*]
+ || [istarget aarch64*-*-*]
++ || ([istarget arm*-*-*] &&
++ [check_effective_target_arm_neon_ok])
+ || [istarget ia64-*-*] } {
+ set et_vect_widen_sum_hi_to_si_pattern_saved 1
+ }
+@@ -5757,6 +6097,8 @@ proc check_effective_target_sync_int_long { } {
+ || [istarget aarch64*-*-*]
+ || [istarget alpha*-*-*]
+ || [istarget arm*-*-linux-*]
++ || ([istarget arm*-*-*]
++ && [check_effective_target_arm_acq_rel])
+ || [istarget bfin*-*linux*]
+ || [istarget hppa*-*linux*]
+ || [istarget s390*-*-*]
+@@ -5790,6 +6132,8 @@ proc check_effective_target_sync_char_short { } {
+ || [istarget i?86-*-*] || [istarget x86_64-*-*]
+ || [istarget alpha*-*-*]
+ || [istarget arm*-*-linux-*]
++ || ([istarget arm*-*-*]
++ && [check_effective_target_arm_acq_rel])
+ || [istarget hppa*-*linux*]
+ || [istarget s390*-*-*]
+ || [istarget powerpc*-*-*]
+--- a/src/gcc/tree-inline.c
++++ b/src/gcc/tree-inline.c
+@@ -244,6 +244,7 @@ remap_ssa_name (tree name, copy_body_data *id)
+ /* At least IPA points-to info can be directly transferred. */
+ if (id->src_cfun->gimple_df
+ && id->src_cfun->gimple_df->ipa_pta
++ && POINTER_TYPE_P (TREE_TYPE (name))
+ && (pi = SSA_NAME_PTR_INFO (name))
+ && !pi->pt.anything)
+ {
+@@ -276,6 +277,7 @@ remap_ssa_name (tree name, copy_body_data *id)
+ /* At least IPA points-to info can be directly transferred. */
+ if (id->src_cfun->gimple_df
+ && id->src_cfun->gimple_df->ipa_pta
++ && POINTER_TYPE_P (TREE_TYPE (name))
+ && (pi = SSA_NAME_PTR_INFO (name))
+ && !pi->pt.anything)
+ {
+--- a/src/gcc/tree-scalar-evolution.c
++++ b/src/gcc/tree-scalar-evolution.c
+@@ -1937,6 +1937,36 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
+ res = chrec_convert (type, chrec1, at_stmt);
+ break;
+
++ case BIT_AND_EXPR:
++ /* Given int variable A, handle A&0xffff as (int)(unsigned short)A.
++ If A is SCEV and its value is in the range of representable set
++ of type unsigned short, the result expression is a (no-overflow)
++ SCEV. */
++ res = chrec_dont_know;
++ if (tree_fits_uhwi_p (rhs2))
++ {
++ int precision;
++ unsigned HOST_WIDE_INT val = tree_to_uhwi (rhs2);
+
-+ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhG));
++ val ++;
++ /* Skip if value of rhs2 wraps in unsigned HOST_WIDE_INT or
++ it's not the maximum value of a smaller type than rhs1. */
++ if (val != 0
++ && (precision = exact_log2 (val)) > 0
++ && (unsigned) precision < TYPE_PRECISION (TREE_TYPE (rhs1)))
++ {
++ tree utype = build_nonstandard_integer_type (precision, 1);
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vecuh16x8_res[indx] != expecteduh8_7[indx])
-+ abort ();
++ if (TYPE_PRECISION (utype) < TYPE_PRECISION (TREE_TYPE (rhs1)))
++ {
++ chrec1 = analyze_scalar_evolution (loop, rhs1);
++ chrec1 = chrec_convert (utype, chrec1, at_stmt);
++ res = chrec_convert (TREE_TYPE (rhs1), chrec1, at_stmt);
++ }
++ }
++ }
++ break;
+
-+ vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhH));
+ default:
+ res = chrec_dont_know;
+ break;
+--- a/src/gcc/tree-ssa-address.c
++++ b/src/gcc/tree-ssa-address.c
+@@ -877,6 +877,10 @@ copy_ref_info (tree new_ref, tree old_ref)
+ && TREE_CODE (old_ref) == MEM_REF
+ && !(TREE_CODE (new_ref) == TARGET_MEM_REF
+ && (TMR_INDEX2 (new_ref)
++ /* TODO: Below conditions can be relaxed if TMR_INDEX
++ is an indcution variable and its initial value and
++ step are aligned. */
++ || (TMR_INDEX (new_ref) && !TMR_STEP (new_ref))
+ || (TMR_STEP (new_ref)
+ && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
+ < align)))))
+--- a/src/gcc/tree-ssa-ccp.c
++++ b/src/gcc/tree-ssa-ccp.c
+@@ -229,13 +229,12 @@ debug_lattice_value (ccp_prop_value_t val)
+ fprintf (stderr, "\n");
+ }
+
+-/* Extend NONZERO_BITS to a full mask, with the upper bits being set. */
++/* Extend NONZERO_BITS to a full mask, based on sgn. */
+
+ static widest_int
+-extend_mask (const wide_int &nonzero_bits)
++extend_mask (const wide_int &nonzero_bits, signop sgn)
+ {
+- return (wi::mask <widest_int> (wi::get_precision (nonzero_bits), true)
+- | widest_int::from (nonzero_bits, UNSIGNED));
++ return widest_int::from (nonzero_bits, sgn);
+ }
+
+ /* Compute a default value for variable VAR and store it in the
+@@ -284,7 +283,7 @@ get_default_value (tree var)
+ {
+ val.lattice_val = CONSTANT;
+ val.value = build_zero_cst (TREE_TYPE (var));
+- val.mask = extend_mask (nonzero_bits);
++ val.mask = extend_mask (nonzero_bits, TYPE_SIGN (TREE_TYPE (var)));
+ }
+ }
+ }
+@@ -1934,7 +1933,7 @@ evaluate_stmt (gimple *stmt)
+ {
+ val.lattice_val = CONSTANT;
+ val.value = build_zero_cst (TREE_TYPE (lhs));
+- val.mask = extend_mask (nonzero_bits);
++ val.mask = extend_mask (nonzero_bits, TYPE_SIGN (TREE_TYPE (lhs)));
+ is_constant = true;
+ }
+ else
+@@ -1945,7 +1944,8 @@ evaluate_stmt (gimple *stmt)
+ if (nonzero_bits == 0)
+ val.mask = 0;
+ else
+- val.mask = val.mask & extend_mask (nonzero_bits);
++ val.mask = val.mask & extend_mask (nonzero_bits,
++ TYPE_SIGN (TREE_TYPE (lhs)));
+ }
+ }
+ }
+--- a/src/gcc/tree-ssa-strlen.c
++++ b/src/gcc/tree-ssa-strlen.c
+@@ -2260,7 +2260,7 @@ public:
+ };
+
+ /* Callback for walk_dominator_tree. Attempt to optimize various
+- string ops by remembering string lenths pointed by pointer SSA_NAMEs. */
++ string ops by remembering string lengths pointed by pointer SSA_NAMEs. */
+
+ edge
+ strlen_dom_walker::before_dom_children (basic_block bb)
+--- a/src/gcc/tree-vect-data-refs.c
++++ b/src/gcc/tree-vect-data-refs.c
+@@ -2238,6 +2238,7 @@ vect_analyze_group_access_1 (struct data_reference *dr)
+ {
+ GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = stmt;
+ GROUP_SIZE (vinfo_for_stmt (stmt)) = groupsize;
++ GROUP_GAP (stmt_info) = groupsize - 1;
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location,
+--- a/src/gcc/tree-vect-loop-manip.c
++++ b/src/gcc/tree-vect-loop-manip.c
+@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3. If not see
+ #include "cfgloop.h"
+ #include "tree-scalar-evolution.h"
+ #include "tree-vectorizer.h"
++#include "tree-ssa-loop-ivopts.h"
+
+ /*************************************************************************
+ Simple Loop Peeling Utilities
+@@ -1594,10 +1595,26 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
+ }
+
+ /* FORNOW: We do not transform initial conditions of IVs
++ which evolution functions are not invariants in the loop. */
+
-+ for (indx = 0; indx < 8; indx++)
-+ if (vecuh16x8_res[indx] != expecteduh8_8[indx])
-+ abort ();
++ if (!expr_invariant_in_loop_p (loop, evolution_part))
++ {
++ if (dump_enabled_p ())
++ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
++ "evolution not invariant in loop.\n");
++ return false;
++ }
+
-+/* { dg-final { scan-assembler-times "mul\tv\[0-9\]+\.8h, v\[0-9\]+\.8h, v\[0-9\]+\.h\\\[0\\\]" 16 } } */
++ /* FORNOW: We do not transform initial conditions of IVs
+ which evolution functions are a polynomial of degree >= 2. */
+
+ if (tree_is_chrec (evolution_part))
+- return false;
++ {
++ if (dump_enabled_p ())
++ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
++ "evolution is chrec.\n");
++ return false;
++ }
+ }
+
+ return true;
+--- a/src/gcc/tree-vect-patterns.c
++++ b/src/gcc/tree-vect-patterns.c
+@@ -2136,32 +2136,313 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
+ return pattern_stmt;
+ }
+
+-/* Detect multiplication by constant which are postive or negatives of power 2,
+- and convert them to shift patterns.
++/* Return true iff the target has a vector optab implementing the operation
++ CODE on type VECTYPE. */
+
+- Mult with constants that are postive power of two.
+- type a_t;
+- type b_t
+- S1: b_t = a_t * n
++static bool
++target_has_vecop_for_code (tree_code code, tree vectype)
++{
++ optab voptab = optab_for_tree_code (code, vectype, optab_vector);
++ return voptab
++ && optab_handler (voptab, TYPE_MODE (vectype)) != CODE_FOR_nothing;
+}
-+
-+int
-+main (void)
+
+- or
++/* Verify that the target has optabs of VECTYPE to perform all the steps
++ needed by the multiplication-by-immediate synthesis algorithm described by
++ ALG and VAR. If SYNTH_SHIFT_P is true ensure that vector addition is
++ present. Return true iff the target supports all the steps. */
++
++static bool
++target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
++ tree vectype, bool synth_shift_p)
+{
-+ check_v2sf (_elemA, _elemB);
-+ check_v4sf (_elemA, _elemB, _elemC, _elemD);
-+ check_v2df (_elemdC, _elemdD);
-+ check_v2si (_elemsA, _elemsB);
-+ check_v4si (_elemsA, _elemsB, _elemsC, _elemsD);
-+ check_v4hi (_elemhA, _elemhB, _elemhC, _elemhD);
-+ check_v8hi (_elemhA, _elemhB, _elemhC, _elemhD,
-+ _elemhE, _elemhF, _elemhG, _elemhH);
-+ check_v2si_unsigned (_elemusA, _elemusB);
-+ check_v4si_unsigned (_elemusA, _elemusB, _elemusC, _elemusD);
-+ check_v4hi_unsigned (_elemuhA, _elemuhB, _elemuhC, _elemuhD);
-+ check_v8hi_unsigned (_elemuhA, _elemuhB, _elemuhC, _elemuhD,
-+ _elemuhE, _elemuhF, _elemuhG, _elemuhH);
++ if (alg->op[0] != alg_zero && alg->op[0] != alg_m)
++ return false;
+
-+ return 0;
-+}
++ bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
++ bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
+
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/struct_return.c
-@@ -0,0 +1,31 @@
-+/* Test the absence of a spurious move from x8 to x0 for functions
-+ return structures. */
-+/* { dg-do compile } */
-+/* { dg-options "-O2" } */
++ if (var == negate_variant
++ && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
++ return false;
+
-+struct s
-+{
-+ long x;
-+ long y;
-+ long z;
-+};
++ /* If we must synthesize shifts with additions make sure that vector
++ addition is available. */
++ if ((var == add_variant || synth_shift_p) && !supports_vplus)
++ return false;
+
-+struct s __attribute__((noinline))
-+foo (long a, long d, long c)
-+{
-+ struct s b;
-+ b.x = a;
-+ b.y = d;
-+ b.z = c;
-+ return b;
-+}
++ for (int i = 1; i < alg->ops; i++)
++ {
++ switch (alg->op[i])
++ {
++ case alg_shift:
++ break;
++ case alg_add_t_m2:
++ case alg_add_t2_m:
++ case alg_add_factor:
++ if (!supports_vplus)
++ return false;
++ break;
++ case alg_sub_t_m2:
++ case alg_sub_t2_m:
++ case alg_sub_factor:
++ if (!supports_vminus)
++ return false;
++ break;
++ case alg_unknown:
++ case alg_m:
++ case alg_zero:
++ case alg_impossible:
++ return false;
++ default:
++ gcc_unreachable ();
++ }
++ }
+
-+int
-+main (void)
-+{
-+ struct s x;
-+ x = foo ( 10, 20, 30);
-+ return x.x + x.y + x.z;
++ return true;
+}
+
-+/* { dg-final { scan-assembler-not "mov\tx0, x8" } } */
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/va_arg_1.c
-@@ -0,0 +1,11 @@
-+/* { dg-do compile } */
-+/* { dg-options "-O2 --save-temps" } */
++/* Synthesize a left shift of OP by AMNT bits using a series of additions and
++ putting the final result in DEST. Append all statements but the last into
++ VINFO. Return the last statement. */
+
-+int
-+f (int a, ...)
++static gimple *
++synth_lshift_by_additions (tree dest, tree op, HOST_WIDE_INT amnt,
++ stmt_vec_info vinfo)
+{
-+ /* { dg-final { scan-assembler-not "str" } } */
-+ return a;
++ HOST_WIDE_INT i;
++ tree itype = TREE_TYPE (op);
++ tree prev_res = op;
++ gcc_assert (amnt >= 0);
++ for (i = 0; i < amnt; i++)
++ {
++ tree tmp_var = (i < amnt - 1) ? vect_recog_temp_ssa_var (itype, NULL)
++ : dest;
++ gimple *stmt
++ = gimple_build_assign (tmp_var, PLUS_EXPR, prev_res, prev_res);
++ prev_res = tmp_var;
++ if (i < amnt - 1)
++ append_pattern_def_seq (vinfo, stmt);
++ else
++ return stmt;
++ }
++ gcc_unreachable ();
++ return NULL;
+}
+
-+/* { dg-final { cleanup-saved-temps } } */
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/va_arg_2.c
-@@ -0,0 +1,18 @@
-+/* { dg-do compile } */
-+/* { dg-options "-O2 --save-temps" } */
++/* Helper for vect_synth_mult_by_constant. Apply a binary operation
++ CODE to operands OP1 and OP2, creating a new temporary SSA var in
++ the process if necessary. Append the resulting assignment statements
++ to the sequence in STMT_VINFO. Return the SSA variable that holds the
++ result of the binary operation. If SYNTH_SHIFT_P is true synthesize
++ left shifts using additions. */
+
-+int
-+foo (char *fmt, ...)
++static tree
++apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
++ stmt_vec_info stmt_vinfo, bool synth_shift_p)
+{
-+ int d;
-+ __builtin_va_list ap;
++ if (integer_zerop (op2)
++ && (code == LSHIFT_EXPR
++ || code == PLUS_EXPR))
++ {
++ gcc_assert (TREE_CODE (op1) == SSA_NAME);
++ return op1;
++ }
+
-+ __builtin_va_start (ap, fmt);
-+ d = __builtin_va_arg (ap, int);
-+ __builtin_va_end (ap);
++ gimple *stmt;
++ tree itype = TREE_TYPE (op1);
++ tree tmp_var = vect_recog_temp_ssa_var (itype, NULL);
+
-+ /* { dg-final { scan-assembler-not "x7" } } */
-+ return d;
-+}
++ if (code == LSHIFT_EXPR
++ && synth_shift_p)
++ {
++ stmt = synth_lshift_by_additions (tmp_var, op1, TREE_INT_CST_LOW (op2),
++ stmt_vinfo);
++ append_pattern_def_seq (stmt_vinfo, stmt);
++ return tmp_var;
++ }
+
-+/* { dg-final { cleanup-saved-temps } } */
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/aarch64/va_arg_3.c
-@@ -0,0 +1,26 @@
-+/* { dg-do compile } */
-+/* { dg-options "-O2 --save-temps" } */
++ stmt = gimple_build_assign (tmp_var, code, op1, op2);
++ append_pattern_def_seq (stmt_vinfo, stmt);
++ return tmp_var;
++}
+
-+int d2i (double a);
++/* Synthesize a multiplication of OP by an INTEGER_CST VAL using shifts
++ and simple arithmetic operations to be vectorized. Record the statements
++ produced in STMT_VINFO and return the last statement in the sequence or
++ NULL if it's not possible to synthesize such a multiplication.
++ This function mirrors the behavior of expand_mult_const in expmed.c but
++ works on tree-ssa form. */
+
-+int
-+foo (char *fmt, ...)
++static gimple *
++vect_synth_mult_by_constant (tree op, tree val,
++ stmt_vec_info stmt_vinfo)
+{
-+ int d, e;
-+ double f, g;
-+ __builtin_va_list ap;
-+
-+ __builtin_va_start (ap, fmt);
-+ d = __builtin_va_arg (ap, int);
-+ f = __builtin_va_arg (ap, double);
-+ g = __builtin_va_arg (ap, double);
-+ d += d2i (f);
-+ d += d2i (g);
-+ __builtin_va_end (ap);
++ tree itype = TREE_TYPE (op);
++ machine_mode mode = TYPE_MODE (itype);
++ struct algorithm alg;
++ mult_variant variant;
++ if (!tree_fits_shwi_p (val))
++ return NULL;
++
++ /* Multiplication synthesis by shifts, adds and subs can introduce
++ signed overflow where the original operation didn't. Perform the
++ operations on an unsigned type and cast back to avoid this.
++ In the future we may want to relax this for synthesis algorithms
++ that we can prove do not cause unexpected overflow. */
++ bool cast_to_unsigned_p = !TYPE_OVERFLOW_WRAPS (itype);
++
++ tree multtype = cast_to_unsigned_p ? unsigned_type_for (itype) : itype;
++
++ /* Targets that don't support vector shifts but support vector additions
++ can synthesize shifts that way. */
++ bool synth_shift_p = !vect_supportable_shift (LSHIFT_EXPR, multtype);
++
++ HOST_WIDE_INT hwval = tree_to_shwi (val);
++ /* Use MAX_COST here as we don't want to limit the sequence on rtx costs.
++ The vectorizer's benefit analysis will decide whether it's beneficial
++ to do this. */
++ bool possible = choose_mult_variant (mode, hwval, &alg,
++ &variant, MAX_COST);
++ if (!possible)
++ return NULL;
+
+- Mult with constants that are negative power of two.
+- S2: b_t = a_t * -n
++ tree vectype = get_vectype_for_scalar_type (multtype);
+
-+ /* { dg-final { scan-assembler-not "x7" } } */
-+ /* { dg-final { scan-assembler-not "q7" } } */
-+ return d;
-+}
++ if (!vectype
++ || !target_supports_mult_synth_alg (&alg, variant,
++ vectype, synth_shift_p))
++ return NULL;
+
-+/* { dg-final { cleanup-saved-temps } } */
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/arm/armv5_thumb_isa.c
-@@ -0,0 +1,8 @@
-+/* { dg-require-effective-target arm_arch_v5_ok } */
-+/* { dg-add-options arm_arch_v5 } */
++ tree accumulator;
+
-+#if __ARM_ARCH_ISA_THUMB
-+#error "__ARM_ARCH_ISA_THUMB defined for ARMv5"
-+#endif
++ /* Clear out the sequence of statements so we can populate it below. */
++ STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo) = NULL;
++ gimple *stmt = NULL;
+
-+int foo;
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
-@@ -0,0 +1,19 @@
-+/* { dg-do compile } */
-+/* { dg-require-effective-target arm_neon_ok } */
-+/* { dg-options "-O3" } */
-+/* { dg-add-options arm_neon } */
++ if (cast_to_unsigned_p)
++ {
++ tree tmp_op = vect_recog_temp_ssa_var (multtype, NULL);
++ stmt = gimple_build_assign (tmp_op, CONVERT_EXPR, op);
++ append_pattern_def_seq (stmt_vinfo, stmt);
++ op = tmp_op;
++ }
+
++ if (alg.op[0] == alg_zero)
++ accumulator = build_int_cst (multtype, 0);
++ else
++ accumulator = op;
+
++ bool needs_fixup = (variant == negate_variant)
++ || (variant == add_variant);
+
-+int
-+t6 (int len, void * dummy, short * __restrict x)
-+{
-+ len = len & ~31;
-+ int result = 0;
-+ __asm volatile ("");
-+ for (int i = 0; i < len; i++)
-+ result += x[i];
-+ return result;
-+}
++ for (int i = 1; i < alg.ops; i++)
++ {
++ tree shft_log = build_int_cst (multtype, alg.log[i]);
++ tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
++ tree tmp_var = NULL_TREE;
+
-+/* { dg-final { scan-assembler "vaddw\.s16" } } */
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/arm/neon-vaddws32.c
-@@ -0,0 +1,18 @@
-+/* { dg-do compile } */
-+/* { dg-require-effective-target arm_neon_ok } */
-+/* { dg-options "-O3" } */
-+/* { dg-add-options arm_neon } */
++ switch (alg.op[i])
++ {
++ case alg_shift:
++ if (synth_shift_p)
++ stmt
++ = synth_lshift_by_additions (accum_tmp, accumulator, alg.log[i],
++ stmt_vinfo);
++ else
++ stmt = gimple_build_assign (accum_tmp, LSHIFT_EXPR, accumulator,
++ shft_log);
++ break;
++ case alg_add_t_m2:
++ tmp_var
++ = apply_binop_and_append_stmt (LSHIFT_EXPR, op, shft_log,
++ stmt_vinfo, synth_shift_p);
++ stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
++ tmp_var);
++ break;
++ case alg_sub_t_m2:
++ tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, op,
++ shft_log, stmt_vinfo,
++ synth_shift_p);
++ /* In some algorithms the first step involves zeroing the
++ accumulator. If subtracting from such an accumulator
++ just emit the negation directly. */
++ if (integer_zerop (accumulator))
++ stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, tmp_var);
++ else
++ stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, accumulator,
++ tmp_var);
++ break;
++ case alg_add_t2_m:
++ tmp_var
++ = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
++ stmt_vinfo, synth_shift_p);
++ stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, tmp_var, op);
++ break;
++ case alg_sub_t2_m:
++ tmp_var
++ = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
++ stmt_vinfo, synth_shift_p);
++ stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var, op);
++ break;
++ case alg_add_factor:
++ tmp_var
++ = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
++ stmt_vinfo, synth_shift_p);
++ stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator,
++ tmp_var);
++ break;
++ case alg_sub_factor:
++ tmp_var
++ = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log,
++ stmt_vinfo, synth_shift_p);
++ stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var,
++ accumulator);
++ break;
++ default:
++ gcc_unreachable ();
++ }
++ /* We don't want to append the last stmt in the sequence to stmt_vinfo
++ but rather return it directly. */
+
++ if ((i < alg.ops - 1) || needs_fixup || cast_to_unsigned_p)
++ append_pattern_def_seq (stmt_vinfo, stmt);
++ accumulator = accum_tmp;
++ }
++ if (variant == negate_variant)
++ {
++ tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
++ stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, accumulator);
++ accumulator = accum_tmp;
++ if (cast_to_unsigned_p)
++ append_pattern_def_seq (stmt_vinfo, stmt);
++ }
++ else if (variant == add_variant)
++ {
++ tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL);
++ stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator, op);
++ accumulator = accum_tmp;
++ if (cast_to_unsigned_p)
++ append_pattern_def_seq (stmt_vinfo, stmt);
++ }
++ /* Move back to a signed if needed. */
++ if (cast_to_unsigned_p)
++ {
++ tree accum_tmp = vect_recog_temp_ssa_var (itype, NULL);
++ stmt = gimple_build_assign (accum_tmp, CONVERT_EXPR, accumulator);
++ }
+
-+int
-+t6 (int len, void * dummy, int * __restrict x)
-+{
-+ len = len & ~31;
-+ long long result = 0;
-+ __asm volatile ("");
-+ for (int i = 0; i < len; i++)
-+ result += x[i];
-+ return result;
++ return stmt;
+}
+
-+/* { dg-final { scan-assembler "vaddw\.s32" } } */
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c
-@@ -0,0 +1,18 @@
-+/* { dg-do compile } */
-+/* { dg-require-effective-target arm_neon_ok } */
-+/* { dg-options "-O3" } */
-+/* { dg-add-options arm_neon } */
++/* Detect multiplication by constant and convert it into a sequence of
++ shifts and additions, subtractions, negations. We reuse the
++ choose_mult_variant algorithms from expmed.c
+
+ Input/Output:
+
+ STMTS: Contains a stmt from which the pattern search begins,
+- i.e. the mult stmt. Convert the mult operation to LSHIFT if
+- constant operand is a power of 2.
+- type a_t, b_t
+- S1': b_t = a_t << log2 (n)
+-
+- Convert the mult operation to LSHIFT and followed by a NEGATE
+- if constant operand is a negative power of 2.
+- type a_t, b_t, res_T;
+- S2': b_t = a_t << log2 (n)
+- S3': res_T = - (b_t)
++ i.e. the mult stmt.
+
+ Output:
+
+@@ -2169,8 +2450,8 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts,
+
+ * TYPE_OUT: The type of the output of this pattern.
+
+- * Return value: A new stmt that will be used to replace the multiplication
+- S1 or S2 stmt. */
++ * Return value: A new stmt that will be used to replace
++ the multiplication. */
+
+ static gimple *
+ vect_recog_mult_pattern (vec<gimple *> *stmts,
+@@ -2178,11 +2459,8 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
+ {
+ gimple *last_stmt = stmts->pop ();
+ tree oprnd0, oprnd1, vectype, itype;
+- gimple *pattern_stmt, *def_stmt;
+- optab optab;
++ gimple *pattern_stmt;
+ stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
+- int power2_val, power2_neg_val;
+- tree shift;
+
+ if (!is_gimple_assign (last_stmt))
+ return NULL;
+@@ -2206,52 +2484,17 @@ vect_recog_mult_pattern (vec<gimple *> *stmts,
+
+ /* If the target can handle vectorized multiplication natively,
+ don't attempt to optimize this. */
+- optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
+- if (optab != unknown_optab)
++ optab mul_optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
++ if (mul_optab != unknown_optab)
+ {
+ machine_mode vec_mode = TYPE_MODE (vectype);
+- int icode = (int) optab_handler (optab, vec_mode);
++ int icode = (int) optab_handler (mul_optab, vec_mode);
+ if (icode != CODE_FOR_nothing)
+- return NULL;
++ return NULL;
+ }
+
+- /* If target cannot handle vector left shift then we cannot
+- optimize and bail out. */
+- optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
+- if (!optab
+- || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
+- return NULL;
+-
+- power2_val = wi::exact_log2 (oprnd1);
+- power2_neg_val = wi::exact_log2 (wi::neg (oprnd1));
+-
+- /* Handle constant operands that are postive or negative powers of 2. */
+- if (power2_val != -1)
+- {
+- shift = build_int_cst (itype, power2_val);
+- pattern_stmt
+- = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
+- LSHIFT_EXPR, oprnd0, shift);
+- }
+- else if (power2_neg_val != -1)
+- {
+- /* If the target cannot handle vector NEGATE then we cannot
+- do the optimization. */
+- optab = optab_for_tree_code (NEGATE_EXPR, vectype, optab_vector);
+- if (!optab
+- || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
+- return NULL;
+-
+- shift = build_int_cst (itype, power2_neg_val);
+- def_stmt
+- = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
+- LSHIFT_EXPR, oprnd0, shift);
+- new_pattern_def_seq (stmt_vinfo, def_stmt);
+- pattern_stmt
+- = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
+- NEGATE_EXPR, gimple_assign_lhs (def_stmt));
+- }
+- else
++ pattern_stmt = vect_synth_mult_by_constant (oprnd0, oprnd1, stmt_vinfo);
++ if (!pattern_stmt)
+ return NULL;
+
+ /* Pattern detected. */
+--- a/src/gcc/tree-vect-stmts.c
++++ b/src/gcc/tree-vect-stmts.c
+@@ -6323,12 +6323,22 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
+ gcc_assert (!nested_in_vect_loop && !STMT_VINFO_GATHER_SCATTER_P (stmt_info));
+
+ first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
++ group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
+
++ if (!slp
++ && !PURE_SLP_STMT (stmt_info)
++ && !STMT_VINFO_STRIDED_P (stmt_info))
++ {
++ if (vect_load_lanes_supported (vectype, group_size))
++ load_lanes_p = true;
++ else if (!vect_grouped_load_supported (vectype, group_size))
++ return false;
++ }
+
+ /* If this is single-element interleaving with an element distance
+ that leaves unused vector loads around punt - we at least create
+ very sub-optimal code in that case (and blow up memory,
+ see PR65518). */
+- bool force_peeling = false;
+ if (first_stmt == stmt
+ && !GROUP_NEXT_ELEMENT (stmt_info))
+ {
+@@ -6342,7 +6352,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
+ }
+
+ /* Single-element interleaving requires peeling for gaps. */
+- force_peeling = true;
++ gcc_assert (GROUP_GAP (stmt_info));
+ }
+
+ /* If there is a gap in the end of the group or the group size cannot
+@@ -6350,9 +6360,8 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
+ elements in the last iteration and thus need to peel that off. */
+ if (loop_vinfo
+ && ! STMT_VINFO_STRIDED_P (stmt_info)
+- && (force_peeling
+- || GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
+- || (!slp && vf % GROUP_SIZE (vinfo_for_stmt (first_stmt)) != 0)))
++ && (GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
++ || (!slp && !load_lanes_p && vf % group_size != 0)))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+@@ -6372,8 +6381,6 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
+ if (slp && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
+ slp_perm = true;
+
+- group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
+-
+ /* ??? The following is overly pessimistic (as well as the loop
+ case above) in the case we can statically determine the excess
+ elements loaded are within the bounds of a decl that is accessed.
+@@ -6386,16 +6393,6 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
+ return false;
+ }
+
+- if (!slp
+- && !PURE_SLP_STMT (stmt_info)
+- && !STMT_VINFO_STRIDED_P (stmt_info))
+- {
+- if (vect_load_lanes_supported (vectype, group_size))
+- load_lanes_p = true;
+- else if (!vect_grouped_load_supported (vectype, group_size))
+- return false;
+- }
+-
+ /* Invalidate assumptions made by dependence analysis when vectorization
+ on the unrolled body effectively re-orders stmts. */
+ if (!PURE_SLP_STMT (stmt_info)
+--- a/src/gcc/tree-vectorizer.c
++++ b/src/gcc/tree-vectorizer.c
+@@ -794,38 +794,142 @@ make_pass_slp_vectorize (gcc::context *ctxt)
+ This should involve global alignment analysis and in the future also
+ array padding. */
+
++static unsigned get_vec_alignment_for_type (tree);
++static hash_map<tree, unsigned> *type_align_map;
+
-+int
-+t6 (int len, void * dummy, unsigned short * __restrict x)
++/* Return alignment of array's vector type corresponding to scalar type.
++ 0 if no vector type exists. */
++static unsigned
++get_vec_alignment_for_array_type (tree type)
+{
-+ len = len & ~31;
-+ unsigned int result = 0;
-+ __asm volatile ("");
-+ for (int i = 0; i < len; i++)
-+ result += x[i];
-+ return result;
-+}
++ gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
+
-+/* { dg-final { scan-assembler "vaddw.u16" } } */
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c
-@@ -0,0 +1,18 @@
-+/* { dg-do compile } */
-+/* { dg-require-effective-target arm_neon_ok } */
-+/* { dg-options "-O3" } */
-+/* { dg-add-options arm_neon } */
++ tree vectype = get_vectype_for_scalar_type (strip_array_types (type));
++ if (!vectype
++ || !TYPE_SIZE (type)
++ || TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST
++ || tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE (vectype)))
++ return 0;
+
++ return TYPE_ALIGN (vectype);
++}
+
-+int
-+t6 (int len, void * dummy, unsigned int * __restrict x)
++/* Return alignment of field having maximum alignment of vector type
++ corresponding to it's scalar type. For now, we only consider fields whose
++ offset is a multiple of it's vector alignment.
++ 0 if no suitable field is found. */
++static unsigned
++get_vec_alignment_for_record_type (tree type)
+{
-+ len = len & ~31;
-+ unsigned long long result = 0;
-+ __asm volatile ("");
-+ for (int i = 0; i < len; i++)
-+ result += x[i];
-+ return result;
-+}
++ gcc_assert (TREE_CODE (type) == RECORD_TYPE);
+
-+/* { dg-final { scan-assembler "vaddw\.u32" } } */
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c
-@@ -0,0 +1,19 @@
-+/* { dg-do compile } */
-+/* { dg-require-effective-target arm_neon_ok } */
-+/* { dg-options "-O3" } */
-+/* { dg-add-options arm_neon } */
++ unsigned max_align = 0, alignment;
++ HOST_WIDE_INT offset;
++ tree offset_tree;
+
++ if (TYPE_PACKED (type))
++ return 0;
+
++ unsigned *slot = type_align_map->get (type);
++ if (slot)
++ return *slot;
+
-+int
-+t6 (int len, void * dummy, char * __restrict x)
-+{
-+ len = len & ~31;
-+ unsigned short result = 0;
-+ __asm volatile ("");
-+ for (int i = 0; i < len; i++)
-+ result += x[i];
-+ return result;
++ for (tree field = first_field (type);
++ field != NULL_TREE;
++ field = DECL_CHAIN (field))
++ {
++ /* Skip if not FIELD_DECL or if alignment is set by user. */
++ if (TREE_CODE (field) != FIELD_DECL
++ || DECL_USER_ALIGN (field)
++ || DECL_ARTIFICIAL (field))
++ continue;
++
++ /* We don't need to process the type further if offset is variable,
++ since the offsets of remaining members will also be variable. */
++ if (TREE_CODE (DECL_FIELD_OFFSET (field)) != INTEGER_CST
++ || TREE_CODE (DECL_FIELD_BIT_OFFSET (field)) != INTEGER_CST)
++ break;
++
++ /* Similarly stop processing the type if offset_tree
++ does not fit in unsigned HOST_WIDE_INT. */
++ offset_tree = bit_position (field);
++ if (!tree_fits_uhwi_p (offset_tree))
++ break;
++
++ offset = tree_to_uhwi (offset_tree);
++ alignment = get_vec_alignment_for_type (TREE_TYPE (field));
++
++ /* Get maximum alignment of vectorized field/array among those members
++ whose offset is multiple of the vector alignment. */
++ if (alignment
++ && (offset % alignment == 0)
++ && (alignment > max_align))
++ max_align = alignment;
++ }
++
++ type_align_map->put (type, max_align);
++ return max_align;
+}
+
-+/* { dg-final { scan-assembler "vaddw\.u8" } } */
---- /dev/null
-+++ b/src/gcc/testsuite/gcc.target/arm/pr37780_1.c
-@@ -0,0 +1,48 @@
-+/* Test that we can remove the conditional move due to CLZ
-+ being defined at zero. */
++/* Return alignment of vector type corresponding to decl's scalar type
++ or 0 if it doesn't exist or the vector alignment is lesser than
++ decl's alignment. */
++static unsigned
++get_vec_alignment_for_type (tree type)
++{
++ if (type == NULL_TREE)
++ return 0;
+
-+/* { dg-do compile } */
-+/* { dg-require-effective-target arm_arch_v6t2_ok } */
-+/* { dg-options "-O2" } */
-+/* { dg-add-options arm_arch_v6t2 } */
++ gcc_assert (TYPE_P (type));
+
-+int
-+fooctz (int i)
-+{
-+ return (i == 0) ? 32 : __builtin_ctz (i);
-+}
++ static unsigned alignment = 0;
++ switch (TREE_CODE (type))
++ {
++ case ARRAY_TYPE:
++ alignment = get_vec_alignment_for_array_type (type);
++ break;
++ case RECORD_TYPE:
++ alignment = get_vec_alignment_for_record_type (type);
++ break;
++ default:
++ alignment = 0;
++ break;
++ }
+
-+int
-+fooctz2 (int i)
-+{
-+ return (i != 0) ? __builtin_ctz (i) : 32;
++ return (alignment > TYPE_ALIGN (type)) ? alignment : 0;
+}
+
-+unsigned int
-+fooctz3 (unsigned int i)
-+{
-+ return (i > 0) ? __builtin_ctz (i) : 32;
-+}
++/* Entry point to increase_alignment pass. */
+ static unsigned int
+ increase_alignment (void)
+ {
+ varpool_node *vnode;
+
+ vect_location = UNKNOWN_LOCATION;
++ type_align_map = new hash_map<tree, unsigned>;
+
+ /* Increase the alignment of all global arrays for vectorization. */
+ FOR_EACH_DEFINED_VARIABLE (vnode)
+ {
+- tree vectype, decl = vnode->decl;
+- tree t;
++ tree decl = vnode->decl;
+ unsigned int alignment;
+
+- t = TREE_TYPE (decl);
+- if (TREE_CODE (t) != ARRAY_TYPE)
+- continue;
+- vectype = get_vectype_for_scalar_type (strip_array_types (t));
+- if (!vectype)
+- continue;
+- alignment = TYPE_ALIGN (vectype);
+- if (DECL_ALIGN (decl) >= alignment)
+- continue;
+-
+- if (vect_can_force_dr_alignment_p (decl, alignment))
++ if ((decl_in_symtab_p (decl)
++ && !symtab_node::get (decl)->can_increase_alignment_p ())
++ || DECL_USER_ALIGN (decl) || DECL_ARTIFICIAL (decl))
++ continue;
++
++ alignment = get_vec_alignment_for_type (TREE_TYPE (decl));
++ if (alignment && vect_can_force_dr_alignment_p (decl, alignment))
+ {
+- vnode->increase_alignment (TYPE_ALIGN (vectype));
++ vnode->increase_alignment (alignment);
+ dump_printf (MSG_NOTE, "Increasing alignment of decl: ");
+ dump_generic_expr (MSG_NOTE, TDF_SLIM, decl);
+ dump_printf (MSG_NOTE, "\n");
+ }
+ }
+
-+/* { dg-final { scan-assembler-times "rbit\t*" 3 } } */
++ delete type_align_map;
+ return 0;
+ }
+
+--- a/src/gcc/tree-vrp.c
++++ b/src/gcc/tree-vrp.c
+@@ -3130,6 +3130,24 @@ extract_range_from_binary_expr_1 (value_range *vr,
+ if (int_cst_range1 && tree_int_cst_sgn (vr1.min) >= 0)
+ wmax = wi::min (wmax, vr1.max, TYPE_SIGN (expr_type));
+ max = wide_int_to_tree (expr_type, wmax);
++ cmp = compare_values (min, max);
++ /* PR68217: In case of signed & sign-bit-CST should
++ result in [-INF, 0] instead of [-INF, INF]. */
++ if (cmp == -2 || cmp == 1)
++ {
++ wide_int sign_bit
++ = wi::set_bit_in_zero (TYPE_PRECISION (expr_type) - 1,
++ TYPE_PRECISION (expr_type));
++ if (!TYPE_UNSIGNED (expr_type)
++ && ((value_range_constant_singleton (&vr0)
++ && !wi::cmps (vr0.min, sign_bit))
++ || (value_range_constant_singleton (&vr1)
++ && !wi::cmps (vr1.min, sign_bit))))
++ {
++ min = TYPE_MIN_VALUE (expr_type);
++ max = build_int_cst (expr_type, 0);
++ }
++ }
+ }
+ else if (code == BIT_IOR_EXPR)
+ {
+@@ -3824,7 +3842,8 @@ extract_range_basic (value_range *vr, gimple *stmt)
+ arg = gimple_call_arg (stmt, 0);
+ if (TREE_CODE (arg) == SSA_NAME
+ && SSA_NAME_IS_DEFAULT_DEF (arg)
+- && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL)
++ && TREE_CODE (SSA_NAME_VAR (arg)) == PARM_DECL
++ && cfun->after_inlining)
+ {
+ set_value_range_to_null (vr, type);
+ return;
+@@ -9906,6 +9925,40 @@ simplify_internal_call_using_ranges (gimple_stmt_iterator *gsi, gimple *stmt)
+ return true;
+ }
+
++/* Return true if VAR is a two-valued variable. Set a and b with the
++ two-values when it is true. Return false otherwise. */
+
-+int
-+fooclz (int i)
++static bool
++two_valued_val_range_p (tree var, tree *a, tree *b)
+{
-+ return (i == 0) ? 32 : __builtin_clz (i);
-+}
++ value_range *vr = get_value_range (var);
++ if ((vr->type != VR_RANGE
++ && vr->type != VR_ANTI_RANGE)
++ || TREE_CODE (vr->min) != INTEGER_CST
++ || TREE_CODE (vr->max) != INTEGER_CST)
++ return false;
+
-+int
-+fooclz2 (int i)
-+{
-+ return (i != 0) ? __builtin_clz (i) : 32;
-+}
++ if (vr->type == VR_RANGE
++ && wi::sub (vr->max, vr->min) == 1)
++ {
++ *a = vr->min;
++ *b = vr->max;
++ return true;
++ }
+
-+unsigned int
-+fooclz3 (unsigned int i)
-+{
-+ return (i > 0) ? __builtin_clz (i) : 32;
++ /* ~[TYPE_MIN + 1, TYPE_MAX - 1] */
++ if (vr->type == VR_ANTI_RANGE
++ && wi::sub (vr->min, vrp_val_min (TREE_TYPE (var))) == 1
++ && wi::sub (vrp_val_max (TREE_TYPE (var)), vr->max) == 1)
++ {
++ *a = vrp_val_min (TREE_TYPE (var));
++ *b = vrp_val_max (TREE_TYPE (var));
++ return true;
++ }
++
++ return false;
+}
+
-+/* { dg-final { scan-assembler-times "clz\t" 6 } } */
-+/* { dg-final { scan-assembler-not "cmp\t.*0" } } */
---- a/src/gcc/testsuite/lib/gcc-dg.exp
-+++ b/src/gcc/testsuite/lib/gcc-dg.exp
-@@ -403,6 +403,7 @@ if { [info procs ${tool}_load] != [list] \
- switch [lindex $result 0] {
- "pass" { set status "fail" }
- "fail" { set status "pass" }
-+ default { set status [lindex $result 0] }
- }
- set result [list $status [lindex $result 1]]
- }
---- a/src/gcc/testsuite/lib/target-supports.exp
-+++ b/src/gcc/testsuite/lib/target-supports.exp
-@@ -4382,6 +4382,8 @@ proc check_effective_target_vect_widen_sum_hi_to_si_pattern { } {
- set et_vect_widen_sum_hi_to_si_pattern_saved 0
- if { [istarget powerpc*-*-*]
- || [istarget aarch64*-*-*]
-+ || ([istarget arm*-*-*] &&
-+ [check_effective_target_arm_neon_ok])
- || [istarget ia64-*-*] } {
- set et_vect_widen_sum_hi_to_si_pattern_saved 1
- }
---- a/src/gcc/tree-scalar-evolution.c
-+++ b/src/gcc/tree-scalar-evolution.c
-@@ -1937,6 +1937,36 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
- res = chrec_convert (type, chrec1, at_stmt);
- break;
+ /* Simplify STMT using ranges if possible. */
-+ case BIT_AND_EXPR:
-+ /* Given int variable A, handle A&0xffff as (int)(unsigned short)A.
-+ If A is SCEV and its value is in the range of representable set
-+ of type unsigned short, the result expression is a (no-overflow)
-+ SCEV. */
-+ res = chrec_dont_know;
-+ if (tree_fits_uhwi_p (rhs2))
+ static bool
+@@ -9916,6 +9969,68 @@ simplify_stmt_using_ranges (gimple_stmt_iterator *gsi)
+ {
+ enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
+ tree rhs1 = gimple_assign_rhs1 (stmt);
++ tree rhs2 = gimple_assign_rhs2 (stmt);
++ tree lhs = gimple_assign_lhs (stmt);
++ tree val1 = NULL_TREE, val2 = NULL_TREE;
++ use_operand_p use_p;
++ gimple *use_stmt;
++
++ /* Convert:
++ LHS = CST BINOP VAR
++ Where VAR is two-valued and LHS is used in GIMPLE_COND only
++ To:
++ LHS = VAR == VAL1 ? (CST BINOP VAL1) : (CST BINOP VAL2)
++
++ Also handles:
++ LHS = VAR BINOP CST
++ Where VAR is two-valued and LHS is used in GIMPLE_COND only
++ To:
++ LHS = VAR == VAL1 ? (VAL1 BINOP CST) : (VAL2 BINOP CST) */
++
++ if (TREE_CODE_CLASS (rhs_code) == tcc_binary
++ && INTEGRAL_TYPE_P (TREE_TYPE (lhs))
++ && ((TREE_CODE (rhs1) == INTEGER_CST
++ && TREE_CODE (rhs2) == SSA_NAME)
++ || (TREE_CODE (rhs2) == INTEGER_CST
++ && TREE_CODE (rhs1) == SSA_NAME))
++ && single_imm_use (lhs, &use_p, &use_stmt)
++ && gimple_code (use_stmt) == GIMPLE_COND)
++
+ {
-+ int precision;
-+ unsigned HOST_WIDE_INT val = tree_to_uhwi (rhs2);
++ tree new_rhs1 = NULL_TREE;
++ tree new_rhs2 = NULL_TREE;
++ tree cmp_var = NULL_TREE;
+
-+ val ++;
-+ /* Skip if value of rhs2 wraps in unsigned HOST_WIDE_INT or
-+ it's not the maximum value of a smaller type than rhs1. */
-+ if (val != 0
-+ && (precision = exact_log2 (val)) > 0
-+ && (unsigned) precision < TYPE_PRECISION (TREE_TYPE (rhs1)))
++ if (TREE_CODE (rhs2) == SSA_NAME
++ && two_valued_val_range_p (rhs2, &val1, &val2))
+ {
-+ tree utype = build_nonstandard_integer_type (precision, 1);
++ /* Optimize RHS1 OP [VAL1, VAL2]. */
++ new_rhs1 = int_const_binop (rhs_code, rhs1, val1);
++ new_rhs2 = int_const_binop (rhs_code, rhs1, val2);
++ cmp_var = rhs2;
++ }
++ else if (TREE_CODE (rhs1) == SSA_NAME
++ && two_valued_val_range_p (rhs1, &val1, &val2))
++ {
++ /* Optimize [VAL1, VAL2] OP RHS2. */
++ new_rhs1 = int_const_binop (rhs_code, val1, rhs2);
++ new_rhs2 = int_const_binop (rhs_code, val2, rhs2);
++ cmp_var = rhs1;
++ }
+
-+ if (TYPE_PRECISION (utype) < TYPE_PRECISION (TREE_TYPE (rhs1)))
-+ {
-+ chrec1 = analyze_scalar_evolution (loop, rhs1);
-+ chrec1 = chrec_convert (utype, chrec1, at_stmt);
-+ res = chrec_convert (TREE_TYPE (rhs1), chrec1, at_stmt);
-+ }
++ /* If we could not find two-vals or the optimzation is invalid as
++ in divide by zero, new_rhs1 / new_rhs will be NULL_TREE. */
++ if (new_rhs1 && new_rhs2)
++ {
++ tree cond = build2 (EQ_EXPR, TREE_TYPE (cmp_var), cmp_var, val1);
++ gimple_assign_set_rhs_with_ops (gsi,
++ COND_EXPR, cond,
++ new_rhs1,
++ new_rhs2);
++ update_stmt (gsi_stmt (*gsi));
++ return true;
+ }
+ }
-+ break;
+
+ switch (rhs_code)
+ {
+--- a/src/gcc/varasm.c
++++ b/src/gcc/varasm.c
+@@ -6772,6 +6772,15 @@ default_use_anchors_for_symbol_p (const_rtx symbol)
+ sections that should be marked as small in the section directive. */
+ if (targetm.in_small_data_p (decl))
+ return false;
++
++ /* Don't use section anchors for decls that won't fit inside a single
++ anchor range to reduce the amount of instructions require to refer
++ to the entire declaration. */
++ if (decl && DECL_SIZE (decl)
++ && tree_to_shwi (DECL_SIZE (decl))
++ >= (targetm.max_anchor_offset * BITS_PER_UNIT))
++ return false;
+
- default:
- res = chrec_dont_know;
+ }
+ return true;
+ }
+--- a/src/libcpp/expr.c
++++ b/src/libcpp/expr.c
+@@ -1073,7 +1073,7 @@ eval_token (cpp_reader *pfile, const cpp_token *token,
+ result.low = 0;
+ if (CPP_OPTION (pfile, warn_undef) && !pfile->state.skip_eval)
+ cpp_warning_with_line (pfile, CPP_W_UNDEF, virtual_location, 0,
+- "\"%s\" is not defined",
++ "\"%s\" is not defined, evaluates to 0",
+ NODE_NAME (token->val.node.node));
+ }
break;
+--- a/src/libgcc/Makefile.in
++++ b/src/libgcc/Makefile.in
+@@ -414,8 +414,9 @@ lib2funcs = _muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 _cmpdi2 _ucmpdi2 \
+ _negvsi2 _negvdi2 _ctors _ffssi2 _ffsdi2 _clz _clzsi2 _clzdi2 \
+ _ctzsi2 _ctzdi2 _popcount_tab _popcountsi2 _popcountdi2 \
+ _paritysi2 _paritydi2 _powisf2 _powidf2 _powixf2 _powitf2 \
+- _mulsc3 _muldc3 _mulxc3 _multc3 _divsc3 _divdc3 _divxc3 \
+- _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2 _clrsbdi2
++ _mulhc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 _divsc3 \
++ _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2 \
++ _clrsbdi2
+
+ # The floating-point conversion routines that involve a single-word integer.
+ # XX stands for the integer mode.
+--- a/src/libgcc/config/arm/bpabi-v6m.S
++++ b/src/libgcc/config/arm/bpabi-v6m.S
+@@ -1,4 +1,5 @@
+-/* Miscellaneous BPABI functions. ARMv6M implementation
++/* Miscellaneous BPABI functions. Thumb-1 implementation, suitable for ARMv4T,
++ ARMv6-M and ARMv8-M Baseline like ISA variants.
+
+ Copyright (C) 2006-2016 Free Software Foundation, Inc.
+ Contributed by CodeSourcery.
--- a/src/libgcc/config/arm/ieee754-df.S
+++ b/src/libgcc/config/arm/ieee754-df.S
@@ -160,8 +160,8 @@ ARM_FUNC_ALIAS aeabi_dadd adddf3
@@ -9225,7 +127434,26 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
@ point. Otherwise the CFI would change to a different state after the branch,
@ which would be disastrous for backtracing.
LSYM(Lad_x):
-@@ -1158,8 +1158,8 @@ ARM_FUNC_ALIAS eqdf2 cmpdf2
+@@ -507,11 +507,15 @@ ARM_FUNC_ALIAS aeabi_f2d extendsfdf2
+ eorne xh, xh, #0x38000000 @ fixup exponent otherwise.
+ RETc(ne) @ and return it.
+
+- teq r2, #0 @ if actually 0
+- do_it ne, e
+- teqne r3, #0xff000000 @ or INF or NAN
++ bics r2, r2, #0xff000000 @ isolate mantissa
++ do_it eq @ if 0, that is ZERO or INF,
+ RETc(eq) @ we are done already.
+
++ teq r3, #0xff000000 @ check for NAN
++ do_it eq, t
++ orreq xh, xh, #0x00080000 @ change to quiet NAN
++ RETc(eq) @ and return it.
++
+ @ value was denormalized. We can normalize it now.
+ do_push {r4, r5, lr}
+ .cfi_adjust_cfa_offset 12 @ CFA is now sp + previousOffset + 12
+@@ -1158,8 +1162,8 @@ ARM_FUNC_ALIAS eqdf2 cmpdf2
1: str ip, [sp, #-4]!
.cfi_adjust_cfa_offset 4 @ CFA is now sp + previousOffset + 4.
@ We're not adding CFI for ip as it's pushed into the stack
@@ -9236,7 +127464,7 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
@ Trap any INF/NAN first.
mov ip, xh, lsl #1
-@@ -1169,14 +1169,14 @@ ARM_FUNC_ALIAS eqdf2 cmpdf2
+@@ -1169,14 +1173,14 @@ ARM_FUNC_ALIAS eqdf2 cmpdf2
COND(mvn,s,ne) ip, ip, asr #21
beq 3f
.cfi_remember_state
@@ -9259,3 +127487,313 @@ LANG=C git diff ac6fe0ee825550e1dfefffd649d49133011d5eb8..91b11ff9859dee06a84ac4
2: add sp, sp, #4
.cfi_adjust_cfa_offset -4 @ CFA is now sp + previousOffset.
+--- a/src/libgcc/config/arm/lib1funcs.S
++++ b/src/libgcc/config/arm/lib1funcs.S
+@@ -108,7 +108,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ # define __ARM_ARCH__ 7
+ #endif
+
+-#if defined(__ARM_ARCH_8A__)
++#if defined(__ARM_ARCH_8A__) || defined(__ARM_ARCH_8M_BASE__) \
++ || defined(__ARM_ARCH_8M_MAIN__)
+ # define __ARM_ARCH__ 8
+ #endif
+
+@@ -124,10 +125,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ && !defined(__thumb2__) \
+ && (!defined(__THUMB_INTERWORK__) \
+ || defined (__OPTIMIZE_SIZE__) \
+- || defined(__ARM_ARCH_6M__)))
++ || !__ARM_ARCH_ISA_ARM))
+ # define __prefer_thumb__
+ #endif
+
++#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1
++#define NOT_ISA_TARGET_32BIT 1
++#endif
++
+ /* How to return from a function call depends on the architecture variant. */
+
+ #if (__ARM_ARCH__ > 4) || defined(__ARM_ARCH_4T__)
+@@ -305,7 +310,7 @@ LSYM(Lend_fde):
+
+ #ifdef __ARM_EABI__
+ .macro THUMB_LDIV0 name signed
+-#if defined(__ARM_ARCH_6M__)
++#ifdef NOT_ISA_TARGET_32BIT
+ .ifc \signed, unsigned
+ cmp r0, #0
+ beq 1f
+@@ -478,7 +483,7 @@ _L__\name:
+
+ #else /* !(__INTERWORKING_STUBS__ || __thumb2__) */
+
+-#ifdef __ARM_ARCH_6M__
++#ifdef NOT_ISA_TARGET_32BIT
+ #define EQUIV .thumb_set
+ #else
+ .macro ARM_FUNC_START name sp_section=
+@@ -510,7 +515,7 @@ SYM (__\name):
+ #endif
+ .endm
+
+-#ifndef __ARM_ARCH_6M__
++#ifndef NOT_ISA_TARGET_32BIT
+ .macro ARM_FUNC_ALIAS new old
+ .globl SYM (__\new)
+ EQUIV SYM (__\new), SYM (__\old)
+@@ -1054,7 +1059,7 @@ ARM_FUNC_START aeabi_uidivmod
+ /* ------------------------------------------------------------------------ */
+ #ifdef L_umodsi3
+
+-#ifdef __ARM_ARCH_EXT_IDIV__
++#if defined(__ARM_ARCH_EXT_IDIV__) && __ARM_ARCH_ISA_THUMB != 1
+
+ ARM_FUNC_START umodsi3
+
+@@ -1240,7 +1245,7 @@ ARM_FUNC_START aeabi_idivmod
+ /* ------------------------------------------------------------------------ */
+ #ifdef L_modsi3
+
+-#if defined(__ARM_ARCH_EXT_IDIV__)
++#if defined(__ARM_ARCH_EXT_IDIV__) && __ARM_ARCH_ISA_THUMB != 1
+
+ ARM_FUNC_START modsi3
+
+@@ -1508,14 +1513,15 @@ LSYM(Lover12):
+
+ #endif /* __symbian__ */
+
+-#if ((__ARM_ARCH__ > 5) && !defined(__ARM_ARCH_6M__)) \
+- || defined(__ARM_ARCH_5E__) || defined(__ARM_ARCH_5TE__) \
+- || defined(__ARM_ARCH_5TEJ__)
++#if (__ARM_ARCH_ISA_THUMB == 2 \
++ || (__ARM_ARCH_ISA_ARM \
++ && (__ARM_ARCH__ > 5 \
++ || (__ARM_ARCH__ == 5 && __ARM_ARCH_ISA_THUMB))))
+ #define HAVE_ARM_CLZ 1
+ #endif
+
+ #ifdef L_clzsi2
+-#if defined(__ARM_ARCH_6M__)
++#ifdef NOT_ISA_TARGET_32BIT
+ FUNC_START clzsi2
+ mov r1, #28
+ mov r3, #1
+@@ -1576,7 +1582,7 @@ ARM_FUNC_START clzsi2
+ #ifdef L_clzdi2
+ #if !defined(HAVE_ARM_CLZ)
+
+-# if defined(__ARM_ARCH_6M__)
++# ifdef NOT_ISA_TARGET_32BIT
+ FUNC_START clzdi2
+ push {r4, lr}
+ # else
+@@ -1601,7 +1607,7 @@ ARM_FUNC_START clzdi2
+ bl __clzsi2
+ # endif
+ 2:
+-# if defined(__ARM_ARCH_6M__)
++# ifdef NOT_ISA_TARGET_32BIT
+ pop {r4, pc}
+ # else
+ RETLDM r4
+@@ -1623,7 +1629,7 @@ ARM_FUNC_START clzdi2
+ #endif /* L_clzdi2 */
+
+ #ifdef L_ctzsi2
+-#if defined(__ARM_ARCH_6M__)
++#ifdef NOT_ISA_TARGET_32BIT
+ FUNC_START ctzsi2
+ neg r1, r0
+ and r0, r0, r1
+@@ -1738,7 +1744,7 @@ ARM_FUNC_START ctzsi2
+
+ /* Don't bother with the old interworking routines for Thumb-2. */
+ /* ??? Maybe only omit these on "m" variants. */
+-#if !defined(__thumb2__) && !defined(__ARM_ARCH_6M__)
++#if !defined(__thumb2__) && __ARM_ARCH_ISA_ARM
+
+ #if defined L_interwork_call_via_rX
+
+@@ -1983,11 +1989,12 @@ LSYM(Lchange_\register):
+ .endm
+
+ #ifndef __symbian__
+-#ifndef __ARM_ARCH_6M__
++/* The condition here must match the one in gcc/config/arm/elf.h. */
++#ifndef NOT_ISA_TARGET_32BIT
+ #include "ieee754-df.S"
+ #include "ieee754-sf.S"
+ #include "bpabi.S"
+-#else /* __ARM_ARCH_6M__ */
++#else /* NOT_ISA_TARGET_32BIT */
+ #include "bpabi-v6m.S"
+-#endif /* __ARM_ARCH_6M__ */
++#endif /* NOT_ISA_TARGET_32BIT */
+ #endif /* !__symbian__ */
+--- a/src/libgcc/config/arm/libunwind.S
++++ b/src/libgcc/config/arm/libunwind.S
+@@ -58,7 +58,7 @@
+ #endif
+ #endif
+
+-#ifdef __ARM_ARCH_6M__
++#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1
+
+ /* r0 points to a 16-word block. Upload these values to the actual core
+ state. */
+@@ -169,7 +169,7 @@ FUNC_START gnu_Unwind_Save_WMMXC
+ UNPREFIX \name
+ .endm
+
+-#else /* !__ARM_ARCH_6M__ */
++#else /* __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1 */
+
+ /* r0 points to a 16-word block. Upload these values to the actual core
+ state. */
+@@ -351,7 +351,7 @@ ARM_FUNC_START gnu_Unwind_Save_WMMXC
+ UNPREFIX \name
+ .endm
+
+-#endif /* !__ARM_ARCH_6M__ */
++#endif /* __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1 */
+
+ UNWIND_WRAPPER _Unwind_RaiseException 1
+ UNWIND_WRAPPER _Unwind_Resume 1
+--- a/src/libgcc/config/arm/t-softfp
++++ b/src/libgcc/config/arm/t-softfp
+@@ -1,2 +1,2 @@
+-softfp_wrap_start := '\#ifdef __ARM_ARCH_6M__'
++softfp_wrap_start := '\#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1'
+ softfp_wrap_end := '\#endif'
+--- a/src/libgcc/libgcc2.c
++++ b/src/libgcc/libgcc2.c
+@@ -1852,7 +1852,8 @@ NAME (TYPE x, int m)
+
+ #endif
+
+-#if ((defined(L_mulsc3) || defined(L_divsc3)) && LIBGCC2_HAS_SF_MODE) \
++#if((defined(L_mulhc3) || defined(L_divhc3)) && LIBGCC2_HAS_HF_MODE) \
++ || ((defined(L_mulsc3) || defined(L_divsc3)) && LIBGCC2_HAS_SF_MODE) \
+ || ((defined(L_muldc3) || defined(L_divdc3)) && LIBGCC2_HAS_DF_MODE) \
+ || ((defined(L_mulxc3) || defined(L_divxc3)) && LIBGCC2_HAS_XF_MODE) \
+ || ((defined(L_multc3) || defined(L_divtc3)) && LIBGCC2_HAS_TF_MODE)
+@@ -1861,7 +1862,13 @@ NAME (TYPE x, int m)
+ #undef double
+ #undef long
+
+-#if defined(L_mulsc3) || defined(L_divsc3)
++#if defined(L_mulhc3) || defined(L_divhc3)
++# define MTYPE HFtype
++# define CTYPE HCtype
++# define MODE hc
++# define CEXT __LIBGCC_HF_FUNC_EXT__
++# define NOTRUNC (!__LIBGCC_HF_EXCESS_PRECISION__)
++#elif defined(L_mulsc3) || defined(L_divsc3)
+ # define MTYPE SFtype
+ # define CTYPE SCtype
+ # define MODE sc
+@@ -1922,7 +1929,7 @@ extern void *compile_type_assert[sizeof(INFINITY) == sizeof(MTYPE) ? 1 : -1];
+ # define TRUNC(x) __asm__ ("" : "=m"(x) : "m"(x))
+ #endif
+
+-#if defined(L_mulsc3) || defined(L_muldc3) \
++#if defined(L_mulhc3) || defined(L_mulsc3) || defined(L_muldc3) \
+ || defined(L_mulxc3) || defined(L_multc3)
+
+ CTYPE
+@@ -1992,7 +1999,7 @@ CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
+ }
+ #endif /* complex multiply */
+
+-#if defined(L_divsc3) || defined(L_divdc3) \
++#if defined(L_divhc3) || defined(L_divsc3) || defined(L_divdc3) \
+ || defined(L_divxc3) || defined(L_divtc3)
+
+ CTYPE
+--- a/src/libgcc/libgcc2.h
++++ b/src/libgcc/libgcc2.h
+@@ -34,6 +34,12 @@ extern void __clear_cache (char *, char *);
+ extern void __eprintf (const char *, const char *, unsigned int, const char *)
+ __attribute__ ((__noreturn__));
+
++#ifdef __LIBGCC_HAS_HF_MODE__
++#define LIBGCC2_HAS_HF_MODE 1
++#else
++#define LIBGCC2_HAS_HF_MODE 0
++#endif
++
+ #ifdef __LIBGCC_HAS_SF_MODE__
+ #define LIBGCC2_HAS_SF_MODE 1
+ #else
+@@ -133,6 +139,10 @@ typedef unsigned int UTItype __attribute__ ((mode (TI)));
+ #endif
+ #endif
+
++#if LIBGCC2_HAS_HF_MODE
++typedef float HFtype __attribute__ ((mode (HF)));
++typedef _Complex float HCtype __attribute__ ((mode (HC)));
++#endif
+ #if LIBGCC2_HAS_SF_MODE
+ typedef float SFtype __attribute__ ((mode (SF)));
+ typedef _Complex float SCtype __attribute__ ((mode (SC)));
+@@ -424,6 +434,10 @@ extern SItype __negvsi2 (SItype);
+ #endif /* COMPAT_SIMODE_TRAPPING_ARITHMETIC */
+
+ #undef int
++#if LIBGCC2_HAS_HF_MODE
++extern HCtype __divhc3 (HFtype, HFtype, HFtype, HFtype);
++extern HCtype __mulhc3 (HFtype, HFtype, HFtype, HFtype);
++#endif
+ #if LIBGCC2_HAS_SF_MODE
+ extern DWtype __fixsfdi (SFtype);
+ extern SFtype __floatdisf (DWtype);
+--- a/src/libstdc++-v3/acinclude.m4
++++ b/src/libstdc++-v3/acinclude.m4
+@@ -632,10 +632,10 @@ dnl baseline_dir
+ dnl baseline_subdir_switch
+ dnl
+ AC_DEFUN([GLIBCXX_CONFIGURE_TESTSUITE], [
+- if $GLIBCXX_IS_NATIVE ; then
+- # Do checks for resource limit functions.
+- GLIBCXX_CHECK_SETRLIMIT
++ # Do checks for resource limit functions.
++ GLIBCXX_CHECK_SETRLIMIT
+
++ if $GLIBCXX_IS_NATIVE ; then
+ # Look for setenv, so that extended locale tests can be performed.
+ GLIBCXX_CHECK_STDLIB_DECL_AND_LINKAGE_3(setenv)
+ fi
+--- a/src/libstdc++-v3/configure
++++ b/src/libstdc++-v3/configure
+@@ -79456,8 +79456,7 @@ $as_echo "$ac_cv_x86_rdrand" >&6; }
+
+ # This depends on GLIBCXX_ENABLE_SYMVERS and GLIBCXX_IS_NATIVE.
+
+- if $GLIBCXX_IS_NATIVE ; then
+- # Do checks for resource limit functions.
++ # Do checks for resource limit functions.
+
+ setrlimit_have_headers=yes
+ for ac_header in unistd.h sys/time.h sys/resource.h
+@@ -79686,6 +79685,7 @@ $as_echo "#define _GLIBCXX_RES_LIMITS 1" >>confdefs.h
+ $as_echo "$ac_res_limits" >&6; }
+
+
++ if $GLIBCXX_IS_NATIVE ; then
+ # Look for setenv, so that extended locale tests can be performed.
+
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for setenv declaration" >&5
+--- a/src/libstdc++-v3/testsuite/29_atomics/atomic/65913.cc
++++ b/src/libstdc++-v3/testsuite/29_atomics/atomic/65913.cc
+@@ -15,7 +15,8 @@
+ // with this library; see the file COPYING3. If not see
+ // <http://www.gnu.org/licenses/>.
+
+-// { dg-do run { target x86_64-*-linux* powerpc*-*-linux* } }
++// { dg-do run }
++// { dg-require-atomic-builtins "" }
+ // { dg-options "-std=gnu++11 -O0" }
+
+ #include <atomic>
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/gcc-6.git
More information about the Reproducible-commits
mailing list