Bug#863672: performance critical libyuv built with Os

Laurent Bigonville bigon at debian.org
Fri Jun 2 15:23:16 UTC 2017


tag 863672 + patch fixed-upstream
thanks

On Mon, 29 May 2017 23:14:38 +0200 Julian Taylor 
<jtaylor.debian at googlemail.com> wrote:

 >
 > libyuv which is a performance critical library for firefix is built with
 > -Os which is horrible for performance for it.
 > In particular row_common.cc which contains the generic parts of the
 > color transformation code:
 >
 > See:
 > 
https://buildd.debian.org/status/fetch.php?pkg=firefox&arch=amd64&ver=53.0.is.52.0.2-1&stamp=1492644908&raw=0
 >
 > /usr/bin/g++ -std=gnu++11 -o row_common.o -c ... -fPIC
 > -DMOZILLA_CLIENT -include
 > /<<PKGBUILDDIR>>/build-browser/mozilla-config.h -MD -MP -MF
 > .deps/row_common.o.pp -Wdate-time -D_FORTIFY_SOURCE=2 -Wall
 > -Wc++11-compat -Wempty-body -Wignored-qualifiers -Woverloaded-virtual
 > -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code
 > -Wwrite-strings -Wno-invalid-offsetof -Wc++14-compat
 > -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations
 > -Wno-error=array-bounds -fno-lifetime-dse -fstack-protector-strong
 > -Wformat -Werror=format-security -fno-schedule-insns2 -fno-lifetime-dse
 > -fno-delete-null-pointer-checks -fno-exceptions -fno-strict-aliasing
 > -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions
 > -fno-math-errno -pthread -pipe -g -freorder-blocks -Os
 > -fomit-frame-pointer
 > /<<PKGBUILDDIR>>/media/libyuv/source/row_common.cc
 >
 >
 > The problematic part is the YuvPixel function which is called in loops
 > and in turn calls tiny clamp functions.
 > Os disables inlining so this causes massive overhead.
 > This is the top cpu profile on sites which e.g. display videos.
 > 17.25% libxul.so [.] YuvPixel â–’
 > 6.58% libxul.so [.] Clamp â–’
 > 6.46% libxul.so [.] clamp255
 >
 > The problem is not as bad as it looks as this generic code is only
 > executed on machines that do not have SSSE3, AVX2 or NEON (see
 > convert_argb.cc)
 > But there are still plenty useful cpus that do not have these
 > instruction sets and are crippled by the compiler flags used.
 >
 > Is it possible to compile this library with O3 to allow the compiler to
 > vectorize it with the best available generic instruction set (e.g. SSE2
 > on x64).

FTR, this is fixed upstream now, -O2 is used by default on desktop build:

https://hg.mozilla.org/integration/autoland/rev/8fdb9e30b6a7



More information about the pkg-mozilla-maintainers mailing list