[SCM] x265/master: New upstream version 2.5
sramacher at users.alioth.debian.org
sramacher at users.alioth.debian.org
Mon Jul 17 18:06:22 UTC 2017
The following commit has been merged in the master branch:
commit 89200355eee42f6c7d9643ba6f8b54fa4313af7c
Author: Sebastian Ramacher <sramacher at debian.org>
Date: Mon Jul 17 20:01:34 2017 +0200
New upstream version 2.5
diff --git a/.hg_archival.txt b/.hg_archival.txt
index bdebac4..5c5835b 100644
--- a/.hg_archival.txt
+++ b/.hg_archival.txt
@@ -1,4 +1,4 @@
repo: 09fe40627f03a0f9c3e6ac78b22ac93da23f9fdf
-node: e7a4dd48293b7956d4a20df257d23904cc78e376
+node: 64b2d0bf45a52511e57a6b7299160b961ca3d51c
branch: stable
-tag: 2.4
+tag: 2.5
diff --git a/.hgtags b/.hgtags
index 0e65da4..4a074b6 100644
--- a/.hgtags
+++ b/.hgtags
@@ -22,3 +22,4 @@ e27327f5da35c5feb660360336fdc94bd0afe719 1.8
981e3bfef16a997bce6f46ce1b15631a0e234747 2.1
be14a7e9755e54f0fd34911c72bdfa66981220bc 2.2
3037c1448549ca920967831482c653e5892fa8ed 2.3
+e7a4dd48293b7956d4a20df257d23904cc78e376 2.4
diff --git a/doc/reST/api.rst b/doc/reST/api.rst
index df9f380..1706ab7 100644
--- a/doc/reST/api.rst
+++ b/doc/reST/api.rst
@@ -192,6 +192,12 @@ changes made to the parameters for auto-detection and other reasons::
* presets is not recommended without a more fine-grained breakdown of
* parameters to take this into account. */
int x265_encoder_reconfig(x265_encoder *, x265_param *);
+**x265_encoder_ctu_info**
+ /* x265_encoder_ctu_info:
+ * Copy CTU information such as ctu address and ctu partition structure of all
+ * CTUs in each frame. The function is invoked only if "--ctu-info" is enabled and
+ * the encoder will wait for this copy to complete if enabled.
+ */
Pictures
========
@@ -341,6 +347,14 @@ statistics from the encoder::
Cleanup
=======
+At the end of the encode, the application will want to trigger logging
+of the final encode statistics, if :option:`--csv` had been specified::
+
+ /* x265_encoder_log:
+ * write a line to the configured CSV file. If a CSV filename was not
+ * configured, or file open failed, this function will perform no write. */
+ void x265_encoder_log(x265_encoder *encoder, int argc, char **argv);
+
Finally, the encoder must be closed in order to free all of its
resources. An encoder that has been flushed cannot be restarted and
reused. Once **x265_encoder_close()** has been called, the encoder
diff --git a/doc/reST/cli.rst b/doc/reST/cli.rst
index 2543bfc..f0a5ff8 100644
--- a/doc/reST/cli.rst
+++ b/doc/reST/cli.rst
@@ -52,8 +52,7 @@ Command line executable return codes::
2. unable to open encoder
3. unable to generate stream headers
4. encoder abort
- 5. unable to open csv file
-
+
Logging/Statistic Options
=========================
@@ -83,9 +82,66 @@ Logging/Statistic Options
it adds one line per run. If :option:`--csv-log-level` is greater than
0, it writes one line per frame. Default none
- Several frame performance statistics are available when
- :option:`--csv-log-level` is greater than or equal to 2:
-
+ The following statistics are available when :option:`--csv-log-level` is
+ greater than or equal to 1:
+
+ **Encode Order** The frame order in which the encoder encodes.
+
+ **Type** Slice type of the frame.
+
+ **POC** Picture Order Count - The display order of the frames.
+
+ **QP** Quantization Parameter decided for the frame.
+
+ **Bits** Number of bits consumed by the frame.
+
+ **Scenecut** 1 if the frame is a scenecut, 0 otherwise.
+
+ **RateFactor** Applicable only when CRF is enabled. The rate factor depends
+ on the CRF given by the user. This is used to determine the QP so as to
+ target a certain quality.
+
+ **BufferFill** Bits available for the next frame. Includes bits carried
+ over from the current frame.
+
+ **Latency** Latency in terms of number of frames between when the frame
+ was given in and when the frame is given out.
+
+ **PSNR** Peak signal to noise ratio for Y, U and V planes.
+
+ **SSIM** A quality metric that denotes the structural similarity between frames.
+
+ **Ref lists** POC of references in lists 0 and 1 for the frame.
+
+ Several statistics about the encoded bitstream and encoder performance are
+ available when :option:`--csv-log-level` is greater than or equal to 2:
+
+ **I/P cost ratio:** The ratio between the cost when a frame is decided as an
+ I frame to that when it is decided as a P frame as computed from the
+ quarter-resolution frame in look-ahead. This, in combination with other parameters
+ such as position of the frame in the GOP, is used to decide scene transitions.
+
+ **Analysis statistics:**
+
+ **CU Statistics** percentage of CU modes.
+
+ **Distortion** Average luma and chroma distortion. Calculated as
+ SSE is done on fenc and recon(after quantization).
+
+ **Psy Energy** Average psy energy calculated as the sum of absolute
+ difference between source and recon energy. Energy is measured by sa8d
+ minus SAD.
+
+ **Residual Energy** Average residual energy. SSE is calculated on fenc
+ and pred(before quantization).
+
+ **Luma/Chroma Values** minumum, maximum and average(averaged by area)
+ luma and chroma values of source for each frame.
+
+ **PU Statistics** percentage of PU modes at each depth.
+
+ **Performance statistics:**
+
**DecideWait ms** number of milliseconds the frame encoder had to
wait, since the previous frame was retrieved by the API thread,
before a new frame has been given to it. This is the latency
@@ -111,6 +167,8 @@ Logging/Statistic Options
**Stall Time ms** the number of milliseconds of the reported wall
time that were spent with zero worker threads, aka all compression
was completely stalled.
+
+ **Total frame time** Total time spent to encode the frame.
**Avg WPP** the average number of worker threads working on this
frame, at any given time. This value is sampled at the completion of
@@ -123,8 +181,6 @@ Logging/Statistic Options
is more of a problem for P frames where some blocks are much more
expensive than others.
- **CLI ONLY**
-
.. option:: --csv-log-level <integer>
Controls the level of detail (and size) of --csv log files
@@ -133,8 +189,6 @@ Logging/Statistic Options
1. frame level logging
2. frame level logging with performance statistics
- **CLI ONLY**
-
.. option:: --ssim, --no-ssim
Calculate and report Structural Similarity values. It is
@@ -795,33 +849,31 @@ the prediction quad-tree.
Analysis re-use options, to improve performance when encoding the same
sequence multiple times (presumably at varying bitrates). The encoder
-will not reuse analysis if the resolution and slice type parameters do
-not match.
+will not reuse analysis if slice type parameters do not match.
-.. option:: --analysis-mode <string|int>
+.. option:: --analysis-reuse-mode <string|int>
- Specify whether analysis information of each frame is output by encoder
- or input for reuse. By reading the analysis data writen by an
- earlier encode of the same sequence, substantial redundant work may
- be avoided.
-
- The following data may be stored and reused:
- I frames - split decisions and luma intra directions of all CUs.
- P/B frames - motion vectors are dumped at each depth for all CUs.
+ This option allows reuse of analysis information from first pass to second pass.
+ :option:`--analysis-reuse-mode save` specifies that encoder outputs analysis information of each frame.
+ :option:`--analysis-reuse-mode load` specifies that encoder reuses analysis information from first pass.
+ There is no benefit using load mode without running encoder in save mode. Analysis data from save mode is
+ written to a file specified by :option:`--analysis-reuse-file`. The amount of analysis data stored/reused
+ is determined by :option:`--analysis-reuse-level`. By reading the analysis data writen by an earlier encode
+ of the same sequence, substantial redundant work may be avoided. Requires cutree, pmode to be off. Default 0.
**Values:** off(0), save(1): dump analysis data, load(2): read analysis data
-.. option:: --analysis-file <filename>
+.. option:: --analysis-reuse-file <filename>
- Specify a filename for analysis data (see :option:`--analysis-mode`)
+ Specify a filename for analysis data (see :option:`--analysis-reuse-mode`)
If no filename is specified, x265_analysis.dat is used.
-.. option:: --refine-level <1..10>
+.. option:: --analysis-reuse-level <1..10>
- Amount of information stored/reused in :option:`--analysis-mode` is distributed across levels.
+ Amount of information stored/reused in :option:`--analysis-reuse-mode` is distributed across levels.
Higher the value, higher the information stored/reused, faster the encode. Default 5.
- Note that --refine-level must be paired with analysis-mode.
+ Note that --analysis-reuse-level must be paired with analysis-reuse-mode.
+--------+-----------------------------------------+
| Level | Description |
@@ -835,6 +887,41 @@ not match.
| 10 | Level 5 + Full CU analysis-info |
+--------+-----------------------------------------+
+.. option:: --scale-factor
+
+ Factor by which input video is scaled down for analysis save mode.
+ This option should be coupled with analysis-reuse-mode option, --analysis-reuse-level 10.
+ The ctu size of load should be double the size of save. Default 0.
+
+.. option:: --refine-intra <0|1|2>
+
+ Enables refinement of intra blocks in current encode.
+
+ Level 0 - Forces both mode and depth from the previous encode.
+
+ Level 1 - Evaluates all intra modes for blocks of size one smaller than
+ the min-cu-size of the incoming analysis data from the previous encode,
+ forces modes for blocks of larger size.
+
+ Level 2 - Evaluates all intra modes for blocks of size one smaller than
+ the min-cu-size of the incoming analysis data from the previous encode.
+ For larger blocks, force only depth when angular mode is chosen by the
+ previous encode, force depth and mode when other intra modes are chosen.
+
+ Default 0.
+
+.. option:: --refine-inter-depth
+
+ Enables refinement of inter blocks in current encode. Evaluates all
+ inter modes for blocks of size one smaller than the min-cu-size of the
+ incoming analysis data from the previous encode. Default disabled.
+
+.. option:: --refine-mv
+
+ Enables refinement of motion vector for scaled video. Evaluates the best
+ motion vector by searching the surrounding eight integer and subpel pixel
+ positions.
+
Options which affect the transform unit quad-tree, sometimes referred to
as the residual quad-tree (RQT).
@@ -1221,7 +1308,16 @@ Slice decision options
intra cost of a frame used in scenecut detection. For example, a value of 5 indicates,
if the inter cost of a frame is greater than or equal to 95 percent of the intra cost of the frame,
then detect this frame as scenecut. Values between 5 and 15 are recommended. Default 5.
-
+
+.. option:: --ctu-info <0, 1, 2, 4, 6>
+
+ This value enables receiving CTU information asynchronously and determine reaction to the CTU information. Default 0.
+ 1: force the partitions if CTU information is present.
+ 2: functionality of (1) and reduce qp if CTU information has changed.
+ 4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise.
+ This option should be enabled only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously.
+ If enabled without calling the API function, the encoder will wait indefinitely.
+
.. option:: --intra-refresh
Enables Periodic Intra Refresh(PIR) instead of keyframe insertion.
@@ -1491,7 +1587,11 @@ Quality, rate control and rate distortion options
and also redundant steps are skipped.
In pass 1 analysis information like motion vector, depth, reference and prediction
modes of the final best CTU partition is stored for each CTU.
- Default disabled.
+ Multipass analysis refinement cannot be enabled when 'analysis-save/analysis-load' option
+ is enabled and both will be disabled when enabled together. This feature requires 'pmode/pme'
+ to be disabled and hence pmode/pme will be disabled when enabled at the same time.
+
+ Default: disabled.
.. option:: --multi-pass-opt-distortion, --no-multi-pass-opt-distortion
@@ -1499,7 +1599,11 @@ Quality, rate control and rate distortion options
ratecontrol. In pass 1 distortion of best CTU partition is stored. CTUs with high
distortion get lower(negative)qp offsets and vice-versa for low distortion CTUs in pass 2.
This helps to improve the subjective quality.
- Default disabled.
+ Multipass refinement of qp cannot be enabled when 'analysis-save/analysis-load' option
+ is enabled and both will be disabled when enabled together. 'multi-pass-opt-distortion'
+ requires 'pmode/pme' to be disabled and hence pmode/pme will be disabled when enabled along with it.
+
+ Default: disabled.
.. option:: --strict-cbr, --no-strict-cbr
@@ -1573,6 +1677,11 @@ Quality, rate control and rate distortion options
that this option is used through the tune grain feature where a combination
of param options are used to improve visual quality.
+.. option:: --const-vbv, --no-const-vbv
+
+ Enables VBV algorithm to be consistent across runs. Default disabled.
+ Enabled when :option:'--tune' grain is applied.
+
.. option:: --qblur <float>
Temporally blur quants. Default 0.5
@@ -1879,7 +1988,12 @@ VUI fields must be manually specified.
.. option:: --dhdr10-info <filename>
- Inserts tone mapping information as an SEI message.
+ Inserts tone mapping information as an SEI message. It takes as input,
+ the path to the JSON file containing the Creative Intent Metadata
+ to be encoded as Dynamic Tone Mapping into the bitstream.
+
+ Click `here <https://www.sra.samsung.com/assets/User-data-registered-itu-t-t35-SEI-message-for-ST-2094-40-v1.1.pdf>`_
+ for the syntax of the metadata file. A sample JSON file is available in `the downloads page <https://bitbucket.org/multicoreware/x265/downloads/DCIP3_4K_to_400_dynamic.json>`_
.. option:: --dhdr10-opt, --no-dhdr10-opt
diff --git a/doc/reST/releasenotes.rst b/doc/reST/releasenotes.rst
index 65264a1..bf88bf2 100644
--- a/doc/reST/releasenotes.rst
+++ b/doc/reST/releasenotes.rst
@@ -2,8 +2,33 @@
Release Notes
*************
-Release Notes
-*************
+Version 2.5
+===========
+
+Release date - 13th July, 2017.
+
+Encoder enhancements
+--------------------
+1. Improved grain handling with :option:`--tune` grain option by throttling VBV operations to limit QP jumps.
+2. Frame threads are now decided based on number of threads specified in the :option:`--pools`, as opposed to the number of hardware threads available. The mapping was also adjusted to improve quality of the encodes with minimal impact to performance.
+3. CSV logging feature (enabled by :option:`--csv`) is now part of the library; it was previously part of the x265 application. Applications that integrate libx265 can now extract frame level statistics for their encodes by exercising this option in the library.
+4. Globals that track min and max CU sizes, number of slices, and other parameters have now been moved into instance-specific variables. Consequently, applications that invoke multiple instances of x265 library are no longer restricted to use the same settings for these parameter options across the multiple instances.
+5. x265 can now generate a seprate library that exports the HDR10+ parsing API. Other libraries that wish to use this API may do so by linking against this library. Enable ENABLE_HDR10_PLUS in CMake options and build to generate this library.
+6. SEA motion search receives a 10% performance boost from AVX2 optimization of its kernels.
+7. The CSV log is now more elaborate with additional fields such as PU statistics, average-min-max luma and chroma values, etc. Refer to documentation of :option:`--csv` for details of all fields.
+8. x86inc.asm cleaned-up for improved instruction handling.
+
+API changes
+-----------
+1. New API x265_encoder_ctu_info() introduced to specify suggested partition sizes for various CTUs in a frame. To be used in conjunction with :option:`--ctu-info` to react to the specified partitions appropriately.
+2. Rate-control statistics passed through the x265_picture object for an incoming frame are now used by the encoder.
+3. Options to scale, reuse, and refine analysis for incoming analysis shared through the x265_analysis_data field in x265_picture for runs that use :option:`--analysis-reuse-mode` load; use options :option:`--scale`, :option:`--refine-mv`, :option:`--refine-inter`, and :option:`--refine-intra` to explore.
+4. VBV now has a deterministic mode. Use :option:`--const-vbv` to exercise.
+
+Bug fixes
+---------
+1. Several fixes for HDR10+ parsing code including incompatibility with user-specific SEI, removal of warnings, linking issues in linux, etc.
+2. SEI messages for HDR10 repeated every keyint when HDR options (:option:`--hdr-opt`, :option:`--master-display`) specified.
Version 2.4
===========
diff --git a/source/CMakeLists.txt b/source/CMakeLists.txt
index acdeb7b..a012dd4 100644
--- a/source/CMakeLists.txt
+++ b/source/CMakeLists.txt
@@ -29,7 +29,7 @@ option(NATIVE_BUILD "Target the build CPU" OFF)
option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
# X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 116)
+set(X265_BUILD 130)
configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
"${PROJECT_BINARY_DIR}/x265.def")
configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
@@ -182,12 +182,19 @@ if(CC STREQUAL "xlc")
add_definitions(-O3 -qstrict -qhot -qaltivec)
add_definitions(-qinline=level=10 -qpath=IL:/data/video_files/latest.tpo/)
endif()
-
-
+# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation
+option(ENABLE_HDR10_PLUS "Enable dynamic HDR10 compilation" OFF)
if(GCC)
add_definitions(-Wall -Wextra -Wshadow)
add_definitions(-D__STDC_LIMIT_MACROS=1)
- add_definitions(-std=gnu++98)
+ if(ENABLE_HDR10_PLUS)
+ if(CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.8")
+ message(FATAL_ERROR "gcc version above 4.8 required to support hdr10plus")
+ endif()
+ add_definitions(-std=gnu++11)
+ else()
+ add_definitions(-std=gnu++98)
+ endif()
if(ENABLE_PIC)
add_definitions(-fPIC)
endif(ENABLE_PIC)
@@ -363,14 +370,12 @@ if(HIGH_BIT_DEPTH)
else(HIGH_BIT_DEPTH)
add_definitions(-DHIGH_BIT_DEPTH=0 -DX265_DEPTH=8)
endif(HIGH_BIT_DEPTH)
-# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation
-option(ENABLE_DYNAMIC_HDR10 "Enable dynamic HDR10 compilation" OFF)
-if (ENABLE_DYNAMIC_HDR10)
- add_subdirectory(dynamicHDR10)
- include_directories(dynamicHDR10)
- add_definitions(-DENABLE_DYNAMIC_HDR10)
-endif(ENABLE_DYNAMIC_HDR10)
+if (ENABLE_HDR10_PLUS)
+ include_directories(. dynamicHDR10 "${PROJECT_BINARY_DIR}")
+ add_subdirectory(dynamicHDR10)
+ add_definitions(-DENABLE_HDR10_PLUS)
+endif(ENABLE_HDR10_PLUS)
# this option can only be used when linking multiple libx265 libraries
# together, and some alternate API access method is implemented.
option(EXPORT_C_API "Implement public C programming interface" ON)
@@ -510,8 +515,10 @@ if((MSVC_IDE OR XCODE OR GCC) AND ENABLE_ASSEMBLY)
endif()
endif()
source_group(ASM FILES ${ASM_SRCS})
-if(ENABLE_DYNAMIC_HDR10)
+if(ENABLE_HDR10_PLUS)
add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS})
+ add_library(hdr10plus-static STATIC $<TARGET_OBJECTS:dynamicHDR10>)
+ set_target_properties(hdr10plus-static PROPERTIES OUTPUT_NAME hdr10plus)
else()
add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS})
endif()
@@ -524,6 +531,12 @@ endif()
install(TARGETS x265-static
LIBRARY DESTINATION ${LIB_INSTALL_DIR}
ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
+
+if(ENABLE_HDR10_PLUS)
+ install(TARGETS hdr10plus-static
+ LIBRARY DESTINATION ${LIB_INSTALL_DIR}
+ ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
+endif()
install(FILES x265.h "${PROJECT_BINARY_DIR}/x265_config.h" DESTINATION include)
if(CMAKE_RC_COMPILER)
@@ -547,10 +560,16 @@ if(NOT (MSVC_IDE OR XCODE))
endif()
option(ENABLE_SHARED "Build shared library" ON)
if(ENABLE_SHARED)
-
- if(ENABLE_DYNAMIC_HDR10)
+ if(ENABLE_HDR10_PLUS)
add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS}
${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10>)
+ add_library(hdr10plus-shared SHARED $<TARGET_OBJECTS:dynamicHDR10>)
+
+ if(MSVC)
+ set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME libhdr10plus)
+ else()
+ set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME hdr10plus)
+ endif()
else()
add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS}
${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common>)
@@ -585,6 +604,11 @@ if(ENABLE_SHARED)
ARCHIVE DESTINATION ${LIB_INSTALL_DIR}
RUNTIME DESTINATION ${BIN_INSTALL_DIR})
endif()
+ if(ENABLE_HDR10_PLUS)
+ install(TARGETS hdr10plus-shared
+ LIBRARY DESTINATION ${LIB_INSTALL_DIR}
+ ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
+ endif()
if(LINKER_OPTIONS)
# set_target_properties can't do list expansion
string(REPLACE ";" " " LINKER_OPTION_STR "${LINKER_OPTIONS}")
@@ -646,18 +670,18 @@ if(ENABLE_CLI)
endif(WIN32)
if(XCODE)
# Xcode seems unable to link the CLI with libs, so link as one targget
- if(ENABLE_DYNAMIC_HDR10)
+ if(ENABLE_HDR10_PLUS)
add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
- x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp
+ x265.cpp x265.h x265cli.h
$<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS})
else()
add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
- x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp
+ x265.cpp x265.h x265cli.h
$<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS})
endif()
else()
add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT} ${X265_RC_FILE}
- ${ExportDefs} x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp)
+ ${ExportDefs} x265.cpp x265.h x265cli.h)
if(WIN32 OR NOT ENABLE_SHARED OR INTEL_CXX)
# The CLI cannot link to the shared library on Windows, it
# requires internal APIs not exported from the DLL
diff --git a/source/common/CMakeLists.txt b/source/common/CMakeLists.txt
index 102ef22..541abe6 100644
--- a/source/common/CMakeLists.txt
+++ b/source/common/CMakeLists.txt
@@ -57,10 +57,10 @@ if(ENABLE_ASSEMBLY AND X86)
set(VEC_PRIMITIVES vec/vec-primitives.cpp ${PRIMITIVES})
source_group(Intrinsics FILES ${VEC_PRIMITIVES})
- set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h)
+ set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h seaintegral.h)
set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm ssd-a.asm mc-a.asm
mc-a2.asm pixel-util8.asm blockcopy8.asm
- pixeladd8.asm dct8.asm)
+ pixeladd8.asm dct8.asm seaintegral.asm)
if(HIGH_BIT_DEPTH)
set(A_SRCS ${A_SRCS} sad16-a.asm intrapred16.asm ipfilter16.asm loopfilter.asm)
else()
diff --git a/source/common/common.h b/source/common/common.h
index a7daf1d..82f5ccd 100644
--- a/source/common/common.h
+++ b/source/common/common.h
@@ -259,7 +259,6 @@ typedef int16_t coeff_t; // transform coefficient
#define LOG2_RASTER_SIZE (MAX_LOG2_CU_SIZE - LOG2_UNIT_SIZE)
#define RASTER_SIZE (1 << LOG2_RASTER_SIZE)
#define MAX_NUM_PARTITIONS (RASTER_SIZE * RASTER_SIZE)
-#define NUM_4x4_PARTITIONS (1U << (g_unitSizeDepth << 1)) // number of 4x4 units in max CU size
#define MIN_PU_SIZE 4
#define MIN_TU_SIZE 4
diff --git a/source/common/constants.cpp b/source/common/constants.cpp
index e360793..be1c926 100644
--- a/source/common/constants.cpp
+++ b/source/common/constants.cpp
@@ -161,7 +161,6 @@ const uint16_t x265_chroma_lambda2_offset_tab[MAX_CHROMA_LAMBDA_OFFSET+1] =
65535
};
-int g_ctuSizeConfigured = 0;
uint32_t g_maxLog2CUSize = MAX_LOG2_CU_SIZE;
uint32_t g_maxCUSize = MAX_CU_SIZE;
uint32_t g_unitSizeDepth = NUM_CU_DEPTH;
diff --git a/source/common/constants.h b/source/common/constants.h
index f8b5d85..93731f4 100644
--- a/source/common/constants.h
+++ b/source/common/constants.h
@@ -30,8 +30,6 @@
namespace X265_NS {
// private namespace
-extern int g_ctuSizeConfigured;
-
extern double x265_lambda_tab[QP_MAX_MAX + 1];
extern double x265_lambda2_tab[QP_MAX_MAX + 1];
extern const uint16_t x265_chroma_lambda2_offset_tab[MAX_CHROMA_LAMBDA_OFFSET + 1];
diff --git a/source/common/cpu.cpp b/source/common/cpu.cpp
index 7d51abf..1f17778 100644
--- a/source/common/cpu.cpp
+++ b/source/common/cpu.cpp
@@ -69,6 +69,7 @@ const cpu_name_t cpu_names[] =
{ "SSE2Slow", SSE2 | X265_CPU_SSE2_IS_SLOW },
{ "SSE2", SSE2 },
{ "SSE2Fast", SSE2 | X265_CPU_SSE2_IS_FAST },
+ { "LZCNT", X265_CPU_LZCNT },
{ "SSE3", SSE2 | X265_CPU_SSE3 },
{ "SSSE3", SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 },
{ "SSE4.1", SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 | X265_CPU_SSE4 },
@@ -78,16 +79,17 @@ const cpu_name_t cpu_names[] =
{ "AVX", AVX },
{ "XOP", AVX | X265_CPU_XOP },
{ "FMA4", AVX | X265_CPU_FMA4 },
- { "AVX2", AVX | X265_CPU_AVX2 },
{ "FMA3", AVX | X265_CPU_FMA3 },
+ { "BMI1", AVX | X265_CPU_LZCNT | X265_CPU_BMI1 },
+ { "BMI2", AVX | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 },
+#define AVX2 AVX | X265_CPU_FMA3 | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 | X265_CPU_AVX2
+ { "AVX2", AVX2},
+#undef AVX2
#undef AVX
#undef SSE2
#undef MMX2
{ "Cache32", X265_CPU_CACHELINE_32 },
{ "Cache64", X265_CPU_CACHELINE_64 },
- { "LZCNT", X265_CPU_LZCNT },
- { "BMI1", X265_CPU_BMI1 },
- { "BMI2", X265_CPU_BMI1 | X265_CPU_BMI2 },
{ "SlowCTZ", X265_CPU_SLOW_CTZ },
{ "SlowAtom", X265_CPU_SLOW_ATOM },
{ "SlowPshufb", X265_CPU_SLOW_PSHUFB },
diff --git a/source/common/cudata.cpp b/source/common/cudata.cpp
index 639f6d6..7e69d87 100644
--- a/source/common/cudata.cpp
+++ b/source/common/cudata.cpp
@@ -28,6 +28,7 @@
#include "picyuv.h"
#include "mv.h"
#include "cudata.h"
+#define MAX_MV 1 << 14
using namespace X265_NS;
@@ -110,25 +111,23 @@ inline MV scaleMv(MV mv, int scale)
}
-cubcast_t CUData::s_partSet[NUM_FULL_DEPTH] = { NULL, NULL, NULL, NULL, NULL };
-uint32_t CUData::s_numPartInCUSize;
-
CUData::CUData()
{
memset(this, 0, sizeof(*this));
}
-void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance)
+void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance)
{
+ int csp = param.internalCsp;
m_chromaFormat = csp;
m_hChromaShift = CHROMA_H_SHIFT(csp);
m_vChromaShift = CHROMA_V_SHIFT(csp);
- m_numPartitions = NUM_4x4_PARTITIONS >> (depth * 2);
+ m_numPartitions = param.num4x4Partitions >> (depth * 2);
if (!s_partSet[0])
{
- s_numPartInCUSize = 1 << g_unitSizeDepth;
- switch (g_maxLog2CUSize)
+ s_numPartInCUSize = 1 << param.unitSizeDepth;
+ switch (param.maxLog2CUSize)
{
case 6:
s_partSet[0] = bcast256;
@@ -220,7 +219,7 @@ void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp,
m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
- uint32_t cuSize = g_maxCUSize >> depth;
+ uint32_t cuSize = param.maxCUSize >> depth;
m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (cuSize * cuSize);
m_trCoeff[1] = m_trCoeff[2] = 0;
m_transformSkip[1] = m_transformSkip[2] = m_cbf[1] = m_cbf[2] = 0;
@@ -262,7 +261,7 @@ void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp,
m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
- uint32_t cuSize = g_maxCUSize >> depth;
+ uint32_t cuSize = param.maxCUSize >> depth;
uint32_t sizeL = cuSize * cuSize;
uint32_t sizeC = sizeL >> (m_hChromaShift + m_vChromaShift); // block chroma part
m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (sizeL + sizeC * 2);
@@ -278,17 +277,17 @@ void CUData::initCTU(const Frame& frame, uint32_t cuAddr, int qp, uint32_t first
m_encData = frame.m_encData;
m_slice = m_encData->m_slice;
m_cuAddr = cuAddr;
- m_cuPelX = (cuAddr % m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize;
- m_cuPelY = (cuAddr / m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize;
+ m_cuPelX = (cuAddr % m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize;
+ m_cuPelY = (cuAddr / m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize;
m_absIdxInCTU = 0;
- m_numPartitions = NUM_4x4_PARTITIONS;
+ m_numPartitions = m_encData->m_param->num4x4Partitions;
m_bFirstRowInSlice = (uint8_t)firstRowInSlice;
m_bLastRowInSlice = (uint8_t)lastRowInSlice;
m_bLastCuInSlice = (uint8_t)lastCuInSlice;
/* sequential memsets */
m_partSet((uint8_t*)m_qp, (uint8_t)qp);
- m_partSet(m_log2CUSize, (uint8_t)g_maxLog2CUSize);
+ m_partSet(m_log2CUSize, (uint8_t)m_slice->m_param->maxLog2CUSize);
m_partSet(m_lumaIntraDir, (uint8_t)ALL_IDX);
m_partSet(m_chromaIntraDir, (uint8_t)ALL_IDX);
m_partSet(m_tqBypass, (uint8_t)frame.m_encData->m_param->bLossless);
@@ -390,7 +389,7 @@ void CUData::copyPartFrom(const CUData& subCU, const CUGeom& childGeom, uint32_t
memcpy(m_distortion + offset, subCU.m_distortion, childGeom.numPartitions * sizeof(sse_t));
- uint32_t tmp = 1 << ((g_maxLog2CUSize - childGeom.depth) * 2);
+ uint32_t tmp = 1 << ((m_slice->m_param->maxLog2CUSize - childGeom.depth) * 2);
uint32_t tmp2 = subPartIdx * tmp;
memcpy(m_trCoeff[0] + tmp2, subCU.m_trCoeff[0], sizeof(coeff_t)* tmp);
@@ -489,7 +488,7 @@ void CUData::copyToPic(uint32_t depth) const
memcpy(ctu.m_distortion + m_absIdxInCTU, m_distortion, m_numPartitions * sizeof(sse_t));
- uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
+ uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2);
uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
@@ -568,7 +567,7 @@ void CUData::updatePic(uint32_t depth, int picCsp) const
m_partCopy(ctu.m_tuDepth + m_absIdxInCTU, m_tuDepth);
m_partCopy(ctu.m_cbf[0] + m_absIdxInCTU, m_cbf[0]);
- uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
+ uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2);
uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
@@ -656,7 +655,7 @@ const CUData* CUData::getPUAboveLeft(uint32_t& alPartUnitIdx, uint32_t curPartUn
return m_cuLeft;
}
- alPartUnitIdx = NUM_4x4_PARTITIONS - 1;
+ alPartUnitIdx = m_encData->m_param->num4x4Partitions - 1;
return m_cuAboveLeft;
}
@@ -799,7 +798,7 @@ const CUData* CUData::getPUAboveRightAdi(uint32_t& arPartUnitIdx, uint32_t curPa
/* Get left QpMinCu */
const CUData* CUData::getQpMinCuLeft(uint32_t& lPartUnitIdx, uint32_t curAbsIdxInCTU) const
{
- uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
+ uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx];
// check for left CTU boundary
@@ -816,7 +815,7 @@ const CUData* CUData::getQpMinCuLeft(uint32_t& lPartUnitIdx, uint32_t curAbsIdxI
/* Get above QpMinCu */
const CUData* CUData::getQpMinCuAbove(uint32_t& aPartUnitIdx, uint32_t curAbsIdxInCTU) const
{
- uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
+ uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx];
// check for top CTU boundary
@@ -855,7 +854,7 @@ int CUData::getLastValidPartIdx(int absPartIdx) const
int8_t CUData::getLastCodedQP(uint32_t absPartIdx) const
{
- uint32_t quPartIdxMask = 0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
+ uint32_t quPartIdxMask = 0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
int lastValidPartIdx = getLastValidPartIdx(absPartIdx & quPartIdxMask);
if (lastValidPartIdx >= 0)
@@ -865,7 +864,7 @@ int8_t CUData::getLastCodedQP(uint32_t absPartIdx) const
if (m_absIdxInCTU)
return m_encData->getPicCTU(m_cuAddr)->getLastCodedQP(m_absIdxInCTU);
else if (m_cuAddr > 0 && !(m_slice->m_pps->bEntropyCodingSyncEnabled && !(m_cuAddr % m_slice->m_sps->numCuInWidth)))
- return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(NUM_4x4_PARTITIONS);
+ return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(m_encData->m_param->num4x4Partitions);
else
return (int8_t)m_slice->m_sliceQp;
}
@@ -997,7 +996,7 @@ uint32_t CUData::getCtxSkipFlag(uint32_t absPartIdx) const
bool CUData::setQPSubCUs(int8_t qp, uint32_t absPartIdx, uint32_t depth)
{
- uint32_t curPartNumb = NUM_4x4_PARTITIONS >> (depth << 1);
+ uint32_t curPartNumb = m_encData->m_param->num4x4Partitions >> (depth << 1);
uint32_t curPartNumQ = curPartNumb >> 2;
if (m_cuDepth[absPartIdx] > depth)
@@ -1623,6 +1622,11 @@ uint32_t CUData::getInterMergeCandidates(uint32_t absPartIdx, uint32_t puIdx, MV
dir |= (1 << list);
candMvField[count][list].mv = colmv;
candMvField[count][list].refIdx = refIdx;
+ if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_log2CUSize[0] < 4)
+ {
+ MV dist(MAX_MV, MAX_MV);
+ candMvField[count][list].mv = dist;
+ }
}
}
@@ -1783,7 +1787,13 @@ int CUData::getPMV(InterNeighbourMV *neighbours, uint32_t picList, uint32_t refI
int curRefPOC = m_slice->m_refPOCList[picList][refIdx];
int curPOC = m_slice->m_poc;
- pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
+ if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (m_log2CUSize[0] < 4))
+ {
+ MV dist(MAX_MV, MAX_MV);
+ pmv[numMvc++] = amvpCand[num++] = dist;
+ }
+ else
+ pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
}
}
@@ -1905,10 +1915,10 @@ void CUData::clipMv(MV& outMV) const
uint32_t offset = 8;
int16_t xmax = (int16_t)((m_slice->m_sps->picWidthInLumaSamples + offset - m_cuPelX - 1) << mvshift);
- int16_t xmin = -(int16_t)((g_maxCUSize + offset + m_cuPelX - 1) << mvshift);
+ int16_t xmin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelX - 1) << mvshift);
int16_t ymax = (int16_t)((m_slice->m_sps->picHeightInLumaSamples + offset - m_cuPelY - 1) << mvshift);
- int16_t ymin = -(int16_t)((g_maxCUSize + offset + m_cuPelY - 1) << mvshift);
+ int16_t ymin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelY - 1) << mvshift);
outMV.x = X265_MIN(xmax, X265_MAX(xmin, outMV.x));
outMV.y = X265_MIN(ymax, X265_MAX(ymin, outMV.y));
@@ -2090,6 +2100,8 @@ void CUData::getTUEntropyCodingParameters(TUEntropyCodingParameters &result, uin
void CUData::calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUSize, uint32_t minCUSize, CUGeom cuDataArray[CUGeom::MAX_GEOMS])
{
+ uint32_t num4x4Partition = (1U << ((g_log2Size[maxCUSize] - LOG2_UNIT_SIZE) << 1));
+
// Initialize the coding blocks inside the CTB
for (uint32_t log2CUSize = g_log2Size[maxCUSize], rangeCUIdx = 0; log2CUSize >= g_log2Size[minCUSize]; log2CUSize--)
{
@@ -2118,7 +2130,7 @@ void CUData::calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUS
cu->log2CUSize = log2CUSize;
cu->childOffset = childIdx - cuIdx;
cu->absPartIdx = g_depthScanIdx[yOffset][xOffset] * 4;
- cu->numPartitions = (NUM_4x4_PARTITIONS >> ((g_maxLog2CUSize - cu->log2CUSize) * 2));
+ cu->numPartitions = (num4x4Partition >> ((g_log2Size[maxCUSize] - cu->log2CUSize) * 2));
cu->depth = g_log2Size[maxCUSize] - log2CUSize;
cu->geomRecurId = cuIdx;
diff --git a/source/common/cudata.h b/source/common/cudata.h
index adb3082..b3e6f30 100644
--- a/source/common/cudata.h
+++ b/source/common/cudata.h
@@ -161,8 +161,8 @@ class CUData
{
public:
- static cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
- static uint32_t s_numPartInCUSize;
+ cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
+ uint32_t s_numPartInCUSize;
bool m_vbvAffected;
@@ -225,7 +225,7 @@ public:
CUData();
- void initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance);
+ void initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance);
static void calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUSize, uint32_t minCUSize, CUGeom cuDataArray[CUGeom::MAX_GEOMS]);
void initCTU(const Frame& frame, uint32_t cuAddr, int qp, uint32_t firstRowInSlice, uint32_t lastRowInSlice, uint32_t lastCUInSlice);
@@ -271,7 +271,7 @@ public:
void getInterTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const;
uint32_t getBestRefIdx(uint32_t subPartIdx) const { return ((m_interDir[subPartIdx] & 1) << m_refIdx[0][subPartIdx]) |
(((m_interDir[subPartIdx] >> 1) & 1) << (m_refIdx[1][subPartIdx] + 16)); }
- uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (g_unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
+ uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (m_slice->m_param->unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
uint32_t getNumPartInter(uint32_t absPartIdx) const { return nbPartsTable[(int)m_partSize[absPartIdx]]; }
bool isIntra(uint32_t absPartIdx) const { return m_predMode[absPartIdx] == MODE_INTRA; }
@@ -285,7 +285,7 @@ public:
void getAllowedChromaDir(uint32_t absPartIdx, uint32_t* modeList) const;
int getIntraDirLumaPredictor(uint32_t absPartIdx, uint32_t* intraDirPred) const;
- uint32_t getSCUAddr() const { return (m_cuAddr << g_unitSizeDepth * 2) + m_absIdxInCTU; }
+ uint32_t getSCUAddr() const { return (m_cuAddr << m_slice->m_param->unitSizeDepth * 2) + m_absIdxInCTU; }
uint32_t getCtxSplitFlag(uint32_t absPartIdx, uint32_t depth) const;
uint32_t getCtxSkipFlag(uint32_t absPartIdx) const;
void getTUEntropyCodingParameters(TUEntropyCodingParameters &result, uint32_t absPartIdx, uint32_t log2TrSize, bool bIsLuma) const;
@@ -350,10 +350,10 @@ struct CUDataMemPool
CUDataMemPool() { charMemBlock = NULL; trCoeffMemBlock = NULL; mvMemBlock = NULL; distortionMemBlock = NULL; }
- bool create(uint32_t depth, uint32_t csp, uint32_t numInstances)
+ bool create(uint32_t depth, uint32_t csp, uint32_t numInstances, const x265_param& param)
{
- uint32_t numPartition = NUM_4x4_PARTITIONS >> (depth * 2);
- uint32_t cuSize = g_maxCUSize >> depth;
+ uint32_t numPartition = param.num4x4Partitions >> (depth * 2);
+ uint32_t cuSize = param.maxCUSize >> depth;
uint32_t sizeL = cuSize * cuSize;
if (csp == X265_CSP_I400)
{
diff --git a/source/common/frame.cpp b/source/common/frame.cpp
index aefe9a6..3111bb9 100644
--- a/source/common/frame.cpp
+++ b/source/common/frame.cpp
@@ -48,6 +48,11 @@ Frame::Frame()
m_rcData = NULL;
m_encodeStartTime = 0;
m_reconfigureRc = false;
+ m_ctuInfo = NULL;
+ m_prevCtuInfoChange = NULL;
+ m_addOnDepth = NULL;
+ m_addOnCtuInfo = NULL;
+ m_addOnPrevChange = NULL;
}
bool Frame::create(x265_param *param, float* quantOffsets)
@@ -56,11 +61,26 @@ bool Frame::create(x265_param *param, float* quantOffsets)
m_param = param;
CHECKED_MALLOC_ZERO(m_rcData, RcStats, 1);
- if (m_fencPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp) &&
- m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
+ if (param->bCTUInfo)
+ {
+ uint32_t widthInCTU = (m_param->sourceWidth + param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+ uint32_t heightInCTU = (m_param->sourceHeight + param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+ uint32_t numCTUsInFrame = widthInCTU * heightInCTU;
+ CHECKED_MALLOC_ZERO(m_addOnDepth, uint8_t *, numCTUsInFrame);
+ CHECKED_MALLOC_ZERO(m_addOnCtuInfo, uint8_t *, numCTUsInFrame);
+ CHECKED_MALLOC_ZERO(m_addOnPrevChange, int *, numCTUsInFrame);
+ for (uint32_t i = 0; i < numCTUsInFrame; i++)
+ {
+ CHECKED_MALLOC_ZERO(m_addOnDepth[i], uint8_t, uint32_t(param->num4x4Partitions));
+ CHECKED_MALLOC_ZERO(m_addOnCtuInfo[i], uint8_t, uint32_t(param->num4x4Partitions));
+ CHECKED_MALLOC_ZERO(m_addOnPrevChange[i], int, uint32_t(param->num4x4Partitions));
+ }
+ }
+
+ if (m_fencPic->create(param) && m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
{
X265_CHECK((m_reconColCount == NULL), "m_reconColCount was initialized");
- m_numRows = (m_fencPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
+ m_numRows = (m_fencPic->m_picHeight + param->maxCUSize - 1) / param->maxCUSize;
m_reconRowFlag = new ThreadSafeInteger[m_numRows];
m_reconColCount = new ThreadSafeInteger[m_numRows];
@@ -86,12 +106,12 @@ bool Frame::allocEncodeData(x265_param *param, const SPS& sps)
m_reconPic = new PicYuv;
m_param = param;
m_encData->m_reconPic = m_reconPic;
- bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp);
+ bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param);
if (ok)
{
/* initialize right border of m_reconpicYuv as SAO may read beyond the
* end of the picture accessing uninitialized pixels */
- int maxHeight = sps.numCuInHeight * g_maxCUSize;
+ int maxHeight = sps.numCuInHeight * param->maxCUSize;
memset(m_reconPic->m_picOrg[0], 0, sizeof(pixel)* m_reconPic->m_stride * maxHeight);
/* use pre-calculated cu/pu offsets cached in the SPS structure */
@@ -166,6 +186,35 @@ void Frame::destroy()
delete[] m_userSEI.payloads;
}
+ if (m_ctuInfo)
+ {
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+ uint32_t numCUsInFrame = widthInCU * heightInCU;
+ for (uint32_t i = 0; i < numCUsInFrame; i++)
+ {
+ X265_FREE((*m_ctuInfo + i)->ctuInfo);
+ (*m_ctuInfo + i)->ctuInfo = NULL;
+ X265_FREE(m_addOnDepth[i]);
+ m_addOnDepth[i] = NULL;
+ X265_FREE(m_addOnCtuInfo[i]);
+ m_addOnCtuInfo[i] = NULL;
+ X265_FREE(m_addOnPrevChange[i]);
+ m_addOnPrevChange[i] = NULL;
+ }
+ X265_FREE(*m_ctuInfo);
+ *m_ctuInfo = NULL;
+ X265_FREE(m_ctuInfo);
+ m_ctuInfo = NULL;
+ X265_FREE(m_prevCtuInfoChange);
+ m_prevCtuInfoChange = NULL;
+ X265_FREE(m_addOnDepth);
+ m_addOnDepth = NULL;
+ X265_FREE(m_addOnCtuInfo);
+ m_addOnCtuInfo = NULL;
+ X265_FREE(m_addOnPrevChange);
+ m_addOnPrevChange = NULL;
+ }
m_lowres.destroy();
X265_FREE(m_rcData);
}
diff --git a/source/common/frame.h b/source/common/frame.h
index 0eae3fd..0ad1173 100644
--- a/source/common/frame.h
+++ b/source/common/frame.h
@@ -66,6 +66,10 @@ struct RcStats
double shortTermCplxCount;
int64_t totalBits;
int64_t encodedBits;
+ double coeff[4];
+ double count[4];
+ double offset[4];
+ double bufferFillFinal;
};
class Frame
@@ -108,7 +112,14 @@ public:
x265_analysis_2Pass m_analysis2Pass;
RcStats* m_rcData;
+ x265_ctu_info_t** m_ctuInfo;
+ Event m_copied;
+ int* m_prevCtuInfoChange;
int64_t m_encodeStartTime;
+
+ uint8_t** m_addOnDepth;
+ uint8_t** m_addOnCtuInfo;
+ int** m_addOnPrevChange;
Frame();
bool create(x265_param *param, float* quantOffsets);
diff --git a/source/common/framedata.cpp b/source/common/framedata.cpp
index 00a74c1..6292b9f 100644
--- a/source/common/framedata.cpp
+++ b/source/common/framedata.cpp
@@ -41,9 +41,9 @@ bool FrameData::create(const x265_param& param, const SPS& sps, int csp)
if (param.rc.bStatWrite)
m_spsrps = const_cast<RPS*>(sps.spsrps);
- m_cuMemPool.create(0, param.internalCsp, sps.numCUsInFrame);
+ m_cuMemPool.create(0, param.internalCsp, sps.numCUsInFrame, param);
for (uint32_t ctuAddr = 0; ctuAddr < sps.numCUsInFrame; ctuAddr++)
- m_picCTU[ctuAddr].initialize(m_cuMemPool, 0, param.internalCsp, ctuAddr);
+ m_picCTU[ctuAddr].initialize(m_cuMemPool, 0, param, ctuAddr);
CHECKED_MALLOC_ZERO(m_cuStat, RCStatCU, sps.numCUsInFrame);
CHECKED_MALLOC(m_rowStat, RCStatRow, sps.numCuInHeight);
diff --git a/source/common/framedata.h b/source/common/framedata.h
index 0004a46..d17b53f 100644
--- a/source/common/framedata.h
+++ b/source/common/framedata.h
@@ -62,6 +62,7 @@ struct FrameStats
double percentMergeCu[NUM_CU_DEPTH];
double percentIntraDistribution[NUM_CU_DEPTH][INTRA_MODES];
double percentInterDistribution[NUM_CU_DEPTH][3]; // 2Nx2N, RECT, AMP modes percentage
+ double ipCostRatio;
uint64_t cntIntraNxN;
uint64_t totalCu;
@@ -78,6 +79,15 @@ struct FrameStats
uint64_t cuInterDistribution[NUM_CU_DEPTH][INTER_MODES];
uint64_t cuIntraDistribution[NUM_CU_DEPTH][INTRA_MODES];
+
+ uint64_t totalPu[NUM_CU_DEPTH + 1];
+ uint64_t cntSkipPu[NUM_CU_DEPTH];
+ uint64_t cntIntraPu[NUM_CU_DEPTH];
+ uint64_t cntAmp[NUM_CU_DEPTH];
+ uint64_t cnt4x4;
+ uint64_t cntInterPu[NUM_CU_DEPTH][INTER_MODES - 1];
+ uint64_t cntMergePu[NUM_CU_DEPTH][INTER_MODES - 1];
+
FrameStats()
{
memset(this, 0, sizeof(FrameStats));
diff --git a/source/common/ipfilter.cpp b/source/common/ipfilter.cpp
index 842b478..acfd7ce 100644
--- a/source/common/ipfilter.cpp
+++ b/source/common/ipfilter.cpp
@@ -123,9 +123,8 @@ void interp_horiz_ps_c(const pixel* src, intptr_t srcStride, int16_t* dst, intpt
const int16_t* coeff = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
int shift = IF_FILTER_PREC - headRoom;
- int offset = -IF_INTERNAL_OFFS << shift;
+ int offset = (unsigned)-IF_INTERNAL_OFFS << shift;
int blkheight = height;
-
src -= N / 2 - 1;
if (isRowExt)
@@ -209,10 +208,8 @@ void interp_vert_ps_c(const pixel* src, intptr_t srcStride, int16_t* dst, intptr
const int16_t* c = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
int shift = IF_FILTER_PREC - headRoom;
- int offset = -IF_INTERNAL_OFFS << shift;
-
+ int offset = (unsigned)-IF_INTERNAL_OFFS << shift;
src -= (N / 2 - 1) * srcStride;
-
int row, col;
for (row = 0; row < height; row++)
{
diff --git a/source/common/lowres.h b/source/common/lowres.h
index 125f4e2..4cb4d00 100644
--- a/source/common/lowres.h
+++ b/source/common/lowres.h
@@ -118,6 +118,8 @@ struct Lowres : public ReferencePlanes
bool bKeyframe;
bool bLastMiniGopBFrame;
+ double ipCostRatio;
+
/* lookahead output data */
int64_t costEst[X265_BFRAME_MAX + 2][X265_BFRAME_MAX + 2];
int64_t costEstAq[X265_BFRAME_MAX + 2][X265_BFRAME_MAX + 2];
diff --git a/source/common/param.cpp b/source/common/param.cpp
index 70bb478..661ef5b 100644
--- a/source/common/param.cpp
+++ b/source/common/param.cpp
@@ -110,6 +110,7 @@ void x265_param_default(x265_param* param)
param->frameNumThreads = 0;
param->logLevel = X265_LOG_INFO;
+ param->csvLogLevel = 0;
param->csvfn = NULL;
param->rc.lambdaFileName = NULL;
param->bLogCuStats = 0;
@@ -194,10 +195,10 @@ void x265_param_default(x265_param* param)
param->rdPenalty = 0;
param->psyRd = 2.0;
param->psyRdoq = 0.0;
- param->analysisMode = 0;
+ param->analysisReuseMode = 0;
param->analysisMultiPassRefine = 0;
param->analysisMultiPassDistortion = 0;
- param->analysisFileName = NULL;
+ param->analysisReuseFileName = NULL;
param->bIntraInBFrames = 0;
param->bLossless = 0;
param->bCULossless = 0;
@@ -236,6 +237,7 @@ void x265_param_default(x265_param* param)
param->rc.bEnableGrain = 0;
param->rc.qpMin = 0;
param->rc.qpMax = QP_MAX_MAX;
+ param->rc.bEnableConstVbv = 0;
/* Video Usability Information (VUI) */
param->vui.aspectRatioIdc = 0;
@@ -271,10 +273,18 @@ void x265_param_default(x265_param* param)
param->bOptCUDeltaQP = 0;
param->bAQMotion = 0;
param->bHDROpt = 0;
- param->analysisRefineLevel = 5;
+ param->analysisReuseLevel = 5;
param->toneMapFile = NULL;
param->bDhdr10opt = 0;
+ param->bCTUInfo = 0;
+ param->bUseRcStats = 0;
+ param->scaleFactor = 0;
+ param->intraRefine = 0;
+ param->interRefine = 0;
+ param->mvRefine = 0;
+ param->bUseAnalysisFile = 1;
+ param->csvfpt = NULL;
}
int x265_param_default_preset(x265_param* param, const char* preset, const char* tune)
@@ -494,6 +504,7 @@ int x265_param_default_preset(x265_param* param, const char* preset, const char*
param->psyRd = 4.0;
param->psyRdoq = 10.0;
param->bEnableSAO = 0;
+ param->rc.bEnableConstVbv = 1;
}
else
return -1;
@@ -828,7 +839,7 @@ int x265_param_parse(x265_param* p, const char* name, const char* value)
p->rc.bStrictCbr = atobool(value);
p->rc.pbFactor = 1.0;
}
- OPT("analysis-mode") p->analysisMode = parseName(value, x265_analysis_names, bError);
+ OPT("analysis-reuse-mode") p->analysisReuseMode = parseName(value, x265_analysis_names, bError);
OPT("sar")
{
p->vui.aspectRatioIdc = parseName(value, x265_sar_names, bError);
@@ -907,7 +918,7 @@ int x265_param_parse(x265_param* p, const char* name, const char* value)
OPT("scaling-list") p->scalingLists = strdup(value);
OPT2("pools", "numa-pools") p->numaPools = strdup(value);
OPT("lambda-file") p->rc.lambdaFileName = strdup(value);
- OPT("analysis-file") p->analysisFileName = strdup(value);
+ OPT("analysis-reuse-file") p->analysisReuseFileName = strdup(value);
OPT("qg-size") p->rc.qgSize = atoi(value);
OPT("master-display") p->masteringDisplayColorVolume = strdup(value);
OPT("max-cll") bError |= sscanf(value, "%hu,%hu", &p->maxCLL, &p->maxFALL) != 2;
@@ -921,6 +932,8 @@ int x265_param_parse(x265_param* p, const char* name, const char* value)
if (bExtraParams)
{
if (0) ;
+ OPT("csv") p->csvfn = strdup(value);
+ OPT("csv-log-level") p->csvLogLevel = atoi(value);
OPT("qpmin") p->rc.qpMin = atoi(value);
OPT("analyze-src-pics") p->bSourceReferenceEstimation = atobool(value);
OPT("log2-max-poc-lsb") p->log2MaxPocLsb = atoi(value);
@@ -938,7 +951,7 @@ int x265_param_parse(x265_param* p, const char* name, const char* value)
OPT("multi-pass-opt-distortion") p->analysisMultiPassDistortion = atobool(value);
OPT("aq-motion") p->bAQMotion = atobool(value);
OPT("dynamic-rd") p->dynamicRd = atof(value);
- OPT("refine-level") p->analysisRefineLevel = atoi(value);
+ OPT("analysis-reuse-level") p->analysisReuseLevel = atoi(value);
OPT("ssim-rd")
{
int bval = atobool(value);
@@ -954,6 +967,12 @@ int x265_param_parse(x265_param* p, const char* name, const char* value)
OPT("limit-sao") p->bLimitSAO = atobool(value);
OPT("dhdr10-info") p->toneMapFile = strdup(value);
OPT("dhdr10-opt") p->bDhdr10opt = atobool(value);
+ OPT("const-vbv") p->rc.bEnableConstVbv = atobool(value);
+ OPT("ctu-info") p->bCTUInfo = atoi(value);
+ OPT("scale-factor") p->scaleFactor = atoi(value);
+ OPT("refine-intra")p->intraRefine = atoi(value);
+ OPT("refine-inter")p->interRefine = atobool(value);
+ OPT("refine-mv")p->mvRefine = atobool(value);
else
return X265_PARAM_BAD_NAME;
}
@@ -1284,16 +1303,19 @@ int x265_check_params(x265_param* param)
"Constant QP is incompatible with 2pass");
CHECK(param->rc.bStrictCbr && (param->rc.bitrate <= 0 || param->rc.vbvBufferSize <=0),
"Strict-cbr cannot be applied without specifying target bitrate or vbv bufsize");
- CHECK(param->analysisMode && (param->analysisMode < X265_ANALYSIS_OFF || param->analysisMode > X265_ANALYSIS_LOAD),
+ CHECK(param->analysisReuseMode && (param->analysisReuseMode < X265_ANALYSIS_OFF || param->analysisReuseMode > X265_ANALYSIS_LOAD),
"Invalid analysis mode. Analysis mode 0: OFF 1: SAVE : 2 LOAD");
- CHECK(param->analysisMode && (param->analysisRefineLevel < 1 || param->analysisRefineLevel > 10),
+ CHECK(param->analysisReuseMode && (param->analysisReuseLevel < 1 || param->analysisReuseLevel > 10),
"Invalid analysis refine level. Value must be between 1 and 10 (inclusive)");
+ CHECK(param->scaleFactor > 2, "Invalid scale-factor. Supports factor <= 2");
CHECK(param->rc.qpMax < QP_MIN || param->rc.qpMax > QP_MAX_MAX,
"qpmax exceeds supported range (0 to 69)");
CHECK(param->rc.qpMin < QP_MIN || param->rc.qpMin > QP_MAX_MAX,
"qpmin exceeds supported range (0 to 69)");
CHECK(param->log2MaxPocLsb < 4 || param->log2MaxPocLsb > 16,
"Supported range for log2MaxPocLsb is 4 to 16");
+ CHECK(param->bCTUInfo < 0 || (param->bCTUInfo != 0 && param->bCTUInfo != 1 && param->bCTUInfo != 2 && param->bCTUInfo != 4 && param->bCTUInfo != 6) || param->bCTUInfo > 6,
+ "Supported values for bCTUInfo are 0, 1, 2, 4, 6");
#if !X86_64
CHECK(param->searchMethod == X265_SEA && (param->sourceWidth > 840 || param->sourceHeight > 480),
"SEA motion search does not support resolutions greater than 480p in 32 bit build");
@@ -1322,42 +1344,6 @@ void x265_param_apply_fastfirstpass(x265_param* param)
}
}
-int x265_set_globals(x265_param* param)
-{
- uint32_t maxLog2CUSize = (uint32_t)g_log2Size[param->maxCUSize];
- uint32_t minLog2CUSize = (uint32_t)g_log2Size[param->minCUSize];
-
- Lock gLock;
- ScopedLock sLock(gLock);
-
- if (++g_ctuSizeConfigured > 1)
- {
- if (g_maxCUSize != param->maxCUSize)
- {
- x265_log(param, X265_LOG_WARNING, "maxCUSize must be the same for all encoders in a single process");
- }
- if (g_maxCUDepth != maxLog2CUSize - minLog2CUSize)
- {
- x265_log(param, X265_LOG_WARNING, "maxCUDepth must be the same for all encoders in a single process");
- }
- param->maxCUSize = g_maxCUSize;
- return x265_check_params(param); /* Check again, since param may have changed */
- }
- else
- {
- // set max CU width & height
- g_maxCUSize = param->maxCUSize;
- g_maxLog2CUSize = maxLog2CUSize;
-
- // compute actual CU depth with respect to config depth and max transform size
- g_maxCUDepth = maxLog2CUSize - minLog2CUSize;
- g_unitSizeDepth = maxLog2CUSize - LOG2_UNIT_SIZE;
- }
-
- g_maxSlices = param->maxSlices;
- return 0;
-}
-
static void appendtool(x265_param* param, char* buf, size_t size, const char* toolstr)
{
static const int overhead = (int)strlen("x265 [info]: tools: ");
@@ -1457,6 +1443,7 @@ void x265_print_params(x265_param* param)
TOOLOPT(param->bEnableStrongIntraSmoothing, "strong-intra-smoothing");
TOOLVAL(param->lookaheadSlices, "lslices=%d");
TOOLVAL(param->lookaheadThreads, "lthreads=%d")
+ TOOLVAL(param->bCTUInfo, "ctu-info=%d");
if (param->maxSlices > 1)
TOOLVAL(param->maxSlices, "slices=%d");
if (param->bEnableLoopFilter)
@@ -1473,8 +1460,8 @@ void x265_print_params(x265_param* param)
TOOLOPT(!param->bSaoNonDeblocked && param->bEnableSAO, "sao");
TOOLOPT(param->rc.bStatWrite, "stats-write");
TOOLOPT(param->rc.bStatRead, "stats-read");
-#if ENABLE_DYNAMIC_HDR10
- TOOLVAL(param->toneMapFile != NULL, "dhdr10-info");
+#if ENABLE_HDR10_PLUS
+ TOOLOPT(param->toneMapFile != NULL, "dhdr10-info");
#endif
x265_log(param, X265_LOG_INFO, "tools:%s\n", buf);
fflush(stderr);
@@ -1501,6 +1488,8 @@ char *x265_param2string(x265_param* p, int padx, int pady)
BOOL(p->bEnablePsnr, "psnr");
BOOL(p->bEnableSsim, "ssim");
s += sprintf(s, " log-level=%d", p->logLevel);
+ if (p->csvfn)
+ s += sprintf(s, " csvfn=%s csv-log-level=%d", p->csvfn, p->csvLogLevel);
s += sprintf(s, " bitdepth=%d", p->internalBitDepth);
s += sprintf(s, " input-csp=%d", p->internalCsp);
s += sprintf(s, " fps=%u/%u", p->fpsNum, p->fpsDenom);
@@ -1573,7 +1562,7 @@ char *x265_param2string(x265_param* p, int padx, int pady)
s += sprintf(s, " psy-rd=%.2f", p->psyRd);
s += sprintf(s, " psy-rdoq=%.2f", p->psyRdoq);
BOOL(p->bEnableRdRefine, "rd-refine");
- s += sprintf(s, " analysis-mode=%d", p->analysisMode);
+ s += sprintf(s, " analysis-reuse-mode=%d", p->analysisReuseMode);
BOOL(p->bLossless, "lossless");
s += sprintf(s, " cbqpoffs=%d", p->cbQpOffset);
s += sprintf(s, " crqpoffs=%d", p->crQpOffset);
@@ -1630,6 +1619,7 @@ char *x265_param2string(x265_param* p, int padx, int pady)
s += sprintf(s, " qg-size=%d", p->rc.qgSize);
BOOL(p->rc.bEnableGrain, "rc-grain");
s += sprintf(s, " qpmax=%d qpmin=%d", p->rc.qpMax, p->rc.qpMin);
+ BOOL(p->rc.bEnableConstVbv, "const-vbv");
s += sprintf(s, " sar=%d", p->vui.aspectRatioIdc);
if (p->vui.aspectRatioIdc == X265_EXTENDED_SAR)
s += sprintf(s, " sar-width : sar-height=%d:%d", p->vui.sarWidth, p->vui.sarHeight);
@@ -1668,8 +1658,13 @@ char *x265_param2string(x265_param* p, int padx, int pady)
BOOL(p->bEmitHDRSEI, "hdr");
BOOL(p->bHDROpt, "hdr-opt");
BOOL(p->bDhdr10opt, "dhdr10-opt");
- s += sprintf(s, " refine-level=%d", p->analysisRefineLevel);
+ s += sprintf(s, " analysis-reuse-level=%d", p->analysisReuseLevel);
+ s += sprintf(s, " scale-factor=%d", p->scaleFactor);
+ s += sprintf(s, " refine-intra=%d", p->intraRefine);
+ s += sprintf(s, " refine-inter=%d", p->interRefine);
+ s += sprintf(s, " refine-mv=%d", p->mvRefine);
BOOL(p->bLimitSAO, "limit-sao");
+ s += sprintf(s, " ctu-info=%d", p->bCTUInfo);
#undef BOOL
return buf;
}
diff --git a/source/common/param.h b/source/common/param.h
index f6f03a1..9424b44 100644
--- a/source/common/param.h
+++ b/source/common/param.h
@@ -28,7 +28,6 @@
namespace X265_NS {
int x265_check_params(x265_param *param);
-int x265_set_globals(x265_param *param);
void x265_print_params(x265_param *param);
void x265_param_apply_fastfirstpass(x265_param *p);
char* x265_param2string(x265_param *param, int padx, int pady);
diff --git a/source/common/picyuv.cpp b/source/common/picyuv.cpp
index ca5d327..01eb955 100644
--- a/source/common/picyuv.cpp
+++ b/source/common/picyuv.cpp
@@ -46,36 +46,62 @@ PicYuv::PicYuv()
m_maxLumaLevel = 0;
m_avgLumaLevel = 0;
+
+ m_maxChromaULevel = 0;
+ m_avgChromaULevel = 0;
+
+ m_maxChromaVLevel = 0;
+ m_avgChromaVLevel = 0;
+
+#if (X265_DEPTH > 8)
+ m_minLumaLevel = 0xFFFF;
+ m_minChromaULevel = 0xFFFF;
+ m_minChromaVLevel = 0xFFFF;
+#else
+ m_minLumaLevel = 0xFF;
+ m_minChromaULevel = 0xFF;
+ m_minChromaVLevel = 0xFF;
+#endif
+
m_stride = 0;
m_strideC = 0;
m_hChromaShift = 0;
m_vChromaShift = 0;
}
-bool PicYuv::create(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
+bool PicYuv::create(x265_param* param, pixel *pixelbuf)
{
+ m_param = param;
+ uint32_t picWidth = m_param->sourceWidth;
+ uint32_t picHeight = m_param->sourceHeight;
+ uint32_t picCsp = m_param->internalCsp;
m_picWidth = picWidth;
m_picHeight = picHeight;
m_hChromaShift = CHROMA_H_SHIFT(picCsp);
m_vChromaShift = CHROMA_V_SHIFT(picCsp);
m_picCsp = picCsp;
- uint32_t numCuInWidth = (m_picWidth + g_maxCUSize - 1) / g_maxCUSize;
- uint32_t numCuInHeight = (m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
+ uint32_t numCuInWidth = (m_picWidth + param->maxCUSize - 1) / param->maxCUSize;
+ uint32_t numCuInHeight = (m_picHeight + param->maxCUSize - 1) / param->maxCUSize;
- m_lumaMarginX = g_maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
- m_lumaMarginY = g_maxCUSize + 16; // margin for 8-tap filter and infinite padding
- m_stride = (numCuInWidth * g_maxCUSize) + (m_lumaMarginX << 1);
+ m_lumaMarginX = param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
+ m_lumaMarginY = param->maxCUSize + 16; // margin for 8-tap filter and infinite padding
+ m_stride = (numCuInWidth * param->maxCUSize) + (m_lumaMarginX << 1);
- int maxHeight = numCuInHeight * g_maxCUSize;
- CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
- m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX;
+ int maxHeight = numCuInHeight * param->maxCUSize;
+ if (pixelbuf)
+ m_picOrg[0] = pixelbuf;
+ else
+ {
+ CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
+ m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX;
+ }
if (picCsp != X265_CSP_I400)
{
m_chromaMarginX = m_lumaMarginX; // keep 16-byte alignment for chroma CTUs
m_chromaMarginY = m_lumaMarginY >> m_vChromaShift;
- m_strideC = ((numCuInWidth * g_maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2);
+ m_strideC = ((numCuInWidth * m_param->maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2);
CHECKED_MALLOC(m_picBuf[1], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
CHECKED_MALLOC(m_picBuf[2], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
@@ -94,12 +120,33 @@ fail:
return false;
}
+int PicYuv::getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
+{
+ m_picWidth = picWidth;
+ m_picHeight = picHeight;
+ m_hChromaShift = CHROMA_H_SHIFT(picCsp);
+ m_vChromaShift = CHROMA_V_SHIFT(picCsp);
+ m_picCsp = picCsp;
+
+ uint32_t numCuInWidth = (m_picWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
+ uint32_t numCuInHeight = (m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
+
+ m_lumaMarginX = m_param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
+ m_lumaMarginY = m_param->maxCUSize + 16; // margin for 8-tap filter and infinite padding
+ m_stride = (numCuInWidth * m_param->maxCUSize) + (m_lumaMarginX << 1);
+
+ int maxHeight = numCuInHeight * m_param->maxCUSize;
+ int bufLen = (int)(m_stride * (maxHeight + (m_lumaMarginY * 2)));
+
+ return bufLen;
+}
+
/* the first picture allocated by the encoder will be asked to generate these
* offset arrays. Once generated, they will be provided to all future PicYuv
* allocated by the same encoder. */
bool PicYuv::createOffsets(const SPS& sps)
{
- uint32_t numPartitions = 1 << (g_unitSizeDepth * 2);
+ uint32_t numPartitions = 1 << (m_param->unitSizeDepth * 2);
if (m_picCsp != X265_CSP_I400)
{
@@ -109,8 +156,8 @@ bool PicYuv::createOffsets(const SPS& sps)
{
for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++)
{
- m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize;
- m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (g_maxCUSize >> m_vChromaShift) + cuCol * (g_maxCUSize >> m_hChromaShift);
+ m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize;
+ m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (m_param->maxCUSize >> m_vChromaShift) + cuCol * (m_param->maxCUSize >> m_hChromaShift);
}
}
@@ -129,7 +176,7 @@ bool PicYuv::createOffsets(const SPS& sps)
CHECKED_MALLOC(m_cuOffsetY, intptr_t, sps.numCuInWidth * sps.numCuInHeight);
for (uint32_t cuRow = 0; cuRow < sps.numCuInHeight; cuRow++)
for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++)
- m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize;
+ m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize;
CHECKED_MALLOC(m_buOffsetY, intptr_t, (size_t)numPartitions);
for (uint32_t idx = 0; idx < numPartitions; ++idx)
@@ -184,6 +231,11 @@ void PicYuv::copyFromPicture(const x265_picture& pic, const x265_param& param, i
X265_CHECK(pic.bitDepth >= 8, "pic.bitDepth check failure");
+ uint64_t lumaSum;
+ uint64_t cbSum;
+ uint64_t crSum;
+ lumaSum = cbSum = crSum = 0;
+
if (pic.bitDepth == 8)
{
#if (X265_DEPTH > 8)
@@ -288,6 +340,47 @@ void PicYuv::copyFromPicture(const x265_picture& pic, const x265_param& param, i
pixel *U = m_picOrg[1];
pixel *V = m_picOrg[2];
+ pixel *yPic = m_picOrg[0];
+ pixel *uPic = m_picOrg[1];
+ pixel *vPic = m_picOrg[2];
+
+ for (int r = 0; r < height; r++)
+ {
+ for (int c = 0; c < width; c++)
+ {
+ m_maxLumaLevel = X265_MAX(yPic[c], m_maxLumaLevel);
+ m_minLumaLevel = X265_MIN(yPic[c], m_minLumaLevel);
+ lumaSum += yPic[c];
+ }
+ yPic += m_stride;
+ }
+ m_avgLumaLevel = (double)lumaSum / (m_picHeight * m_picWidth);
+
+ if (param.csvLogLevel >= 2)
+ {
+ if (param.internalCsp != X265_CSP_I400)
+ {
+ for (int r = 0; r < height >> m_vChromaShift; r++)
+ {
+ for (int c = 0; c < width >> m_hChromaShift; c++)
+ {
+ m_maxChromaULevel = X265_MAX(uPic[c], m_maxChromaULevel);
+ m_minChromaULevel = X265_MIN(uPic[c], m_minChromaULevel);
+ cbSum += uPic[c];
+
+ m_maxChromaVLevel = X265_MAX(vPic[c], m_maxChromaVLevel);
+ m_minChromaVLevel = X265_MIN(vPic[c], m_minChromaVLevel);
+ crSum += vPic[c];
+ }
+
+ uPic += m_strideC;
+ vPic += m_strideC;
+ }
+ m_avgChromaULevel = (double)cbSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift));
+ m_avgChromaVLevel = (double)crSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift));
+ }
+ }
+
#if HIGH_BIT_DEPTH
bool calcHDRParams = !!param.minLuma || (param.maxLuma != PIXEL_MAX);
/* Apply min/max luma bounds for HDR pixel manipulations */
diff --git a/source/common/picyuv.h b/source/common/picyuv.h
index c2e9238..0c8dfa7 100644
--- a/source/common/picyuv.h
+++ b/source/common/picyuv.h
@@ -60,14 +60,25 @@ public:
uint32_t m_chromaMarginX;
uint32_t m_chromaMarginY;
- pixel m_maxLumaLevel;
- double m_avgLumaLevel;
+ pixel m_maxLumaLevel;
+ pixel m_minLumaLevel;
+ double m_avgLumaLevel;
+
+ pixel m_maxChromaULevel;
+ pixel m_minChromaULevel;
+ double m_avgChromaULevel;
+
+ pixel m_maxChromaVLevel;
+ pixel m_minChromaVLevel;
+ double m_avgChromaVLevel;
+ x265_param *m_param;
PicYuv();
- bool create(uint32_t picWidth, uint32_t picHeight, uint32_t csp);
+ bool create(x265_param* param, pixel *pixelbuf = NULL);
bool createOffsets(const SPS& sps);
void destroy();
+ int getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp);
void copyFromPicture(const x265_picture&, const x265_param& param, int padx, int pady);
diff --git a/source/common/primitives.cpp b/source/common/primitives.cpp
index aa72496..211dc2f 100644
--- a/source/common/primitives.cpp
+++ b/source/common/primitives.cpp
@@ -57,6 +57,7 @@ void setupFilterPrimitives_c(EncoderPrimitives &p);
void setupIntraPrimitives_c(EncoderPrimitives &p);
void setupLoopFilterPrimitives_c(EncoderPrimitives &p);
void setupSaoPrimitives_c(EncoderPrimitives &p);
+void setupSeaIntegralPrimitives_c(EncoderPrimitives &p);
void setupCPrimitives(EncoderPrimitives &p)
{
@@ -66,6 +67,7 @@ void setupCPrimitives(EncoderPrimitives &p)
setupIntraPrimitives_c(p); // intrapred.cpp
setupLoopFilterPrimitives_c(p); // loopfilter.cpp
setupSaoPrimitives_c(p); // sao.cpp
+ setupSeaIntegralPrimitives_c(p); // framefilter.cpp
}
void setupAliasPrimitives(EncoderPrimitives &p)
diff --git a/source/common/primitives.h b/source/common/primitives.h
index edee097..cf0bc29 100644
--- a/source/common/primitives.h
+++ b/source/common/primitives.h
@@ -110,6 +110,17 @@ enum ChromaCU422
BLOCK_422_32x64
};
+enum IntegralSize
+{
+ INTEGRAL_4,
+ INTEGRAL_8,
+ INTEGRAL_12,
+ INTEGRAL_16,
+ INTEGRAL_24,
+ INTEGRAL_32,
+ NUM_INTEGRAL_SIZE
+};
+
typedef int (*pixelcmp_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned
typedef int (*pixelcmp_ss_t)(const int16_t* fenc, intptr_t fencstride, const int16_t* fref, intptr_t frefstride);
typedef sse_t (*pixel_sse_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned
@@ -203,6 +214,9 @@ typedef uint32_t (*costC1C2Flag_t)(uint16_t *absCoeff, intptr_t numC1Flag, uint8
typedef void (*pelFilterLumaStrong_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tcP, int32_t tcQ);
typedef void (*pelFilterChroma_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tc, int32_t maskP, int32_t maskQ);
+typedef void (*integralv_t)(uint32_t *sum, intptr_t stride);
+typedef void (*integralh_t)(uint32_t *sum, pixel *pix, intptr_t stride);
+
/* Function pointers to optimized encoder primitives. Each pointer can reference
* either an assembly routine, a SIMD intrinsic primitive, or a C function */
struct EncoderPrimitives
@@ -342,6 +356,9 @@ struct EncoderPrimitives
pelFilterLumaStrong_t pelFilterLumaStrong[2]; // EDGE_VER = 0, EDGE_HOR = 1
pelFilterChroma_t pelFilterChroma[2]; // EDGE_VER = 0, EDGE_HOR = 1
+ integralv_t integral_initv[NUM_INTEGRAL_SIZE];
+ integralh_t integral_inith[NUM_INTEGRAL_SIZE];
+
/* There is one set of chroma primitives per color space. An encoder will
* have just a single color space and thus it will only ever use one entry
* in this array. However we always fill all entries in the array in case
diff --git a/source/common/slice.cpp b/source/common/slice.cpp
index 3d5a5c9..2335ce6 100644
--- a/source/common/slice.cpp
+++ b/source/common/slice.cpp
@@ -185,22 +185,22 @@ void RPS::sortDeltaPOC()
uint32_t Slice::realEndAddress(uint32_t endCUAddr) const
{
// Calculate end address
- uint32_t internalAddress = (endCUAddr - 1) % NUM_4x4_PARTITIONS;
- uint32_t externalAddress = (endCUAddr - 1) / NUM_4x4_PARTITIONS;
- uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * g_maxCUSize;
- uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * g_maxCUSize;
+ uint32_t internalAddress = (endCUAddr - 1) % m_param->num4x4Partitions;
+ uint32_t externalAddress = (endCUAddr - 1) / m_param->num4x4Partitions;
+ uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * m_param->maxCUSize;
+ uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * m_param->maxCUSize;
while (g_zscanToPelX[internalAddress] >= xmax || g_zscanToPelY[internalAddress] >= ymax)
internalAddress--;
internalAddress++;
- if (internalAddress == NUM_4x4_PARTITIONS)
+ if (internalAddress == m_param->num4x4Partitions)
{
internalAddress = 0;
externalAddress++;
}
- return externalAddress * NUM_4x4_PARTITIONS + internalAddress;
+ return externalAddress * m_param->num4x4Partitions + internalAddress;
}
diff --git a/source/common/slice.h b/source/common/slice.h
index 160ebf5..d08da58 100644
--- a/source/common/slice.h
+++ b/source/common/slice.h
@@ -360,6 +360,7 @@ public:
int m_iPPSQpMinus26;
int numRefIdxDefault[2];
int m_iNumRPSInSPS;
+ const x265_param *m_param;
Slice()
{
diff --git a/source/common/threadpool.cpp b/source/common/threadpool.cpp
index f6509b7..a23ba7b 100644
--- a/source/common/threadpool.cpp
+++ b/source/common/threadpool.cpp
@@ -253,6 +253,7 @@ ThreadPool* ThreadPool::allocThreadPools(x265_param* p, int& numPools, bool isTh
int cpusPerNode[MAX_NODE_NUM + 1];
int threadsPerPool[MAX_NODE_NUM + 2];
uint64_t nodeMaskPerPool[MAX_NODE_NUM + 2];
+ int totalNumThreads = 0;
memset(cpusPerNode, 0, sizeof(cpusPerNode));
memset(threadsPerPool, 0, sizeof(threadsPerPool));
@@ -388,9 +389,23 @@ ThreadPool* ThreadPool::allocThreadPools(x265_param* p, int& numPools, bool isTh
if (bNumaSupport)
x265_log(p, X265_LOG_DEBUG, "NUMA node %d may use %d logical cores\n", i, cpusPerNode[i]);
if (threadsPerPool[i])
+ {
numPools += (threadsPerPool[i] + MAX_POOL_THREADS - 1) / MAX_POOL_THREADS;
+ totalNumThreads += threadsPerPool[i];
+ }
}
+ if (!isThreadsReserved)
+ {
+ if (!numPools)
+ {
+ x265_log(p, X265_LOG_DEBUG, "No pool thread available. Deciding frame-threads based on detected CPU threads\n");
+ totalNumThreads = ThreadPool::getCpuCount(); // auto-detect frame threads
+ }
+ if (!p->frameNumThreads)
+ ThreadPool::getFrameThreadsCount(p, totalNumThreads);
+ }
+
if (!numPools)
return NULL;
@@ -412,7 +427,7 @@ ThreadPool* ThreadPool::allocThreadPools(x265_param* p, int& numPools, bool isTh
node++;
int numThreads = X265_MIN(MAX_POOL_THREADS, threadsPerPool[node]);
int origNumThreads = numThreads;
- if (p->lookaheadThreads > numThreads / 2)
+ if (i == 0 && p->lookaheadThreads > numThreads / 2)
{
p->lookaheadThreads = numThreads / 2;
x265_log(p, X265_LOG_DEBUG, "Setting lookahead threads to a maximum of half the total number of threads\n");
@@ -423,7 +438,7 @@ ThreadPool* ThreadPool::allocThreadPools(x265_param* p, int& numPools, bool isTh
maxProviders = 1;
}
- else
+ else if (i == 0)
numThreads -= p->lookaheadThreads;
if (!pools[i].create(numThreads, maxProviders, nodeMaskPerPool[node]))
{
@@ -643,4 +658,21 @@ int ThreadPool::getCpuCount()
#endif
}
+void ThreadPool::getFrameThreadsCount(x265_param* p, int cpuCount)
+{
+ int rows = (p->sourceHeight + p->maxCUSize - 1) >> g_log2Size[p->maxCUSize];
+ if (!p->bEnableWavefront)
+ p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS);
+ else if (cpuCount >= 32)
+ p->frameNumThreads = (p->sourceHeight > 2000) ? 6 : 5;
+ else if (cpuCount >= 16)
+ p->frameNumThreads = 4;
+ else if (cpuCount >= 8)
+ p->frameNumThreads = 3;
+ else if (cpuCount >= 4)
+ p->frameNumThreads = 2;
+ else
+ p->frameNumThreads = 1;
+}
+
} // end namespace X265_NS
diff --git a/source/common/threadpool.h b/source/common/threadpool.h
index 649716d..6f58a70 100644
--- a/source/common/threadpool.h
+++ b/source/common/threadpool.h
@@ -105,6 +105,7 @@ public:
static ThreadPool* allocThreadPools(x265_param* p, int& numPools, bool isThreadsReserved);
static int getCpuCount();
static int getNumaNodeCount();
+ static void getFrameThreadsCount(x265_param* p,int cpuCount);
};
/* Any worker thread may enlist the help of idle worker threads from the same
diff --git a/source/common/x86/asm-primitives.cpp b/source/common/x86/asm-primitives.cpp
index fad3c7a..1546734 100644
--- a/source/common/x86/asm-primitives.cpp
+++ b/source/common/x86/asm-primitives.cpp
@@ -114,6 +114,7 @@ extern "C" {
#include "blockcopy8.h"
#include "intrapred.h"
#include "dct8.h"
+#include "seaintegral.h"
}
#define ALL_LUMA_CU_TYPED(prim, fncdef, fname, cpu) \
@@ -2157,6 +2158,17 @@ void setupAssemblyPrimitives(EncoderPrimitives &p, int cpuMask) // Main10
p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
p.fix8Pack = PFX(cutree_fix8_pack_avx2);
+ p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2);
+ p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2);
+ p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2);
+ p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2);
+ p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2);
+ p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2);
+ p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2);
+ p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2);
+ p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2);
+ p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2);
+
/* TODO: This kernel needs to be modified to work with HIGH_BIT_DEPTH only
p.planeClipAndMax = PFX(planeClipAndMax_avx2); */
@@ -3695,6 +3707,19 @@ void setupAssemblyPrimitives(EncoderPrimitives &p, int cpuMask) // Main
p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
p.fix8Pack = PFX(cutree_fix8_pack_avx2);
+ p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2);
+ p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2);
+ p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2);
+ p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2);
+ p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2);
+ p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2);
+ p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2);
+ p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2);
+ p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2);
+ p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2);
+ p.integral_inith[INTEGRAL_24] = PFX(integral24h_avx2);
+ p.integral_inith[INTEGRAL_32] = PFX(integral32h_avx2);
+
}
#endif
}
diff --git a/source/common/x86/loopfilter.asm b/source/common/x86/loopfilter.asm
index d7d6e89..7e1ed06 100644
--- a/source/common/x86/loopfilter.asm
+++ b/source/common/x86/loopfilter.asm
@@ -1583,7 +1583,7 @@ cglobal saoCuOrgB0, 5,7,8
pshufb m1, m4, m0
pcmpgtb m0, [pb_15] ; m0 = [mask]
- pblendvb m6, m6, m1, m0 ; NOTE: don't use 3 parameters style, x264 macro have some bug!
+ pblendvb m6, m1, m0
pmovsxbw m0, m6 ; offset
punpckhbw m6, m6
@@ -1630,7 +1630,7 @@ cglobal saoCuOrgB0, 4, 7, 8
pshufb m6, m3, m1
pshufb m5, m4, m1
- pblendvb m6, m6, m5, m0 ; NOTE: don't use 3 parameters style, x264 macro have some bug!
+ pblendvb m6, m5, m0
pmovzxbw m1, m2 ; rec
punpckhbw m2, m7
@@ -1904,7 +1904,7 @@ cglobal calSign, 4,5,6
sub r3, r4
movu xmm0, [r3]
movu m3, [r0]
- pblendvb m5, m5, m3, xmm0
+ pblendvb m5, m3, xmm0
movu [r0], m5
.end:
diff --git a/source/common/x86/pixel-a.asm b/source/common/x86/pixel-a.asm
index eaaee77..79b3bb5 100644
--- a/source/common/x86/pixel-a.asm
+++ b/source/common/x86/pixel-a.asm
@@ -227,7 +227,7 @@ cextern pw_pixel_max
; clobber: m3..m7
; out: %1 = satd
%macro SATD_4x4_MMX 3
- %xdefine %%n n%1
+ %xdefine %%n nn%1
%assign offset %2*SIZEOF_PIXEL
LOAD_DIFF m4, m3, none, [r0+ offset], [r2+ offset]
LOAD_DIFF m5, m3, none, [r0+ r1+offset], [r2+ r3+offset]
diff --git a/source/common/x86/pixel-util8.asm b/source/common/x86/pixel-util8.asm
index cb74056..11c2500 100644
--- a/source/common/x86/pixel-util8.asm
+++ b/source/common/x86/pixel-util8.asm
@@ -1597,7 +1597,7 @@ cglobal weight_sp, 6,7,8
.widthLess8:
movu m6, [r1]
- pblendvb m6, m6, m7, m0
+ pblendvb m6, m7, m0
movu [r1], m6
.nextH:
diff --git a/source/common/x86/seaintegral.asm b/source/common/x86/seaintegral.asm
new file mode 100644
index 0000000..cf79ca4
--- /dev/null
+++ b/source/common/x86/seaintegral.asm
@@ -0,0 +1,1062 @@
+;*****************************************************************************
+;* Copyright (C) 2013-2017 MulticoreWare, Inc
+;*
+;* Authors: Jayashri Murugan <jayashri at multicorewareinc.com>
+;* Vignesh V Menon <vignesh at multicorewareinc.com>
+;* Praveen Tiwari <praveen at multicorewareinc.com>
+;*
+;* This program is free software; you can redistribute it and/or modify
+;* it under the terms of the GNU General Public License as published by
+;* the Free Software Foundation; either version 2 of the License, or
+;* (at your option) any later version.
+;*
+;* This program is distributed in the hope that it will be useful,
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;* GNU General Public License for more details.
+;*
+;* You should have received a copy of the GNU General Public License
+;* along with this program; if not, write to the Free Software
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
+;*
+;* This program is also available under a commercial proprietary license.
+;* For more information, contact us at license @ x265.com.
+;*****************************************************************************/
+
+%include "x86inc.asm"
+%include "x86util.asm"
+
+SECTION .text
+
+;-----------------------------------------------------------------------------
+;void integral_init4v_c(uint32_t *sum4, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral4v, 2, 3, 2
+ mov r2, r1
+ shl r2, 4
+
+.loop
+ movu m0, [r0]
+ movu m1, [r0 + r2]
+ psubd m1, m0
+ movu [r0], m1
+ add r0, 32
+ sub r1, 8
+ jnz .loop
+ RET
+
+;-----------------------------------------------------------------------------
+;void integral_init8v_c(uint32_t *sum8, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral8v, 2, 3, 2
+ mov r2, r1
+ shl r2, 5
+
+.loop
+ movu m0, [r0]
+ movu m1, [r0 + r2]
+ psubd m1, m0
+ movu [r0], m1
+ add r0, 32
+ sub r1, 8
+ jnz .loop
+ RET
+
+;-----------------------------------------------------------------------------
+;void integral_init12v_c(uint32_t *sum12, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral12v, 2, 4, 2
+ mov r2, r1
+ mov r3, r1
+ shl r2, 5
+ shl r3, 4
+ add r2, r3
+
+.loop
+ movu m0, [r0]
+ movu m1, [r0 + r2]
+ psubd m1, m0
+ movu [r0], m1
+ add r0, 32
+ sub r1, 8
+ jnz .loop
+ RET
+
+;-----------------------------------------------------------------------------
+;void integral_init16v_c(uint32_t *sum16, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral16v, 2, 3, 2
+ mov r2, r1
+ shl r2, 6
+
+.loop
+ movu m0, [r0]
+ movu m1, [r0 + r2]
+ psubd m1, m0
+ movu [r0], m1
+ add r0, 32
+ sub r1, 8
+ jnz .loop
+ RET
+
+;-----------------------------------------------------------------------------
+;void integral_init24v_c(uint32_t *sum24, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral24v, 2, 4, 2
+ mov r2, r1
+ mov r3, r1
+ shl r2, 6
+ shl r3, 5
+ add r2, r3
+
+.loop
+ movu m0, [r0]
+ movu m1, [r0 + r2]
+ psubd m1, m0
+ movu [r0], m1
+ add r0, 32
+ sub r1, 8
+ jnz .loop
+ RET
+
+;-----------------------------------------------------------------------------
+;void integral_init32v_c(uint32_t *sum32, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral32v, 2, 3, 2
+ mov r2, r1
+ shl r2, 7
+
+.loop
+ movu m0, [r0]
+ movu m1, [r0 + r2]
+ psubd m1, m0
+ movu [r0], m1
+ add r0, 32
+ sub r1, 8
+ jnz .loop
+ RET
+
+%macro INTEGRAL_FOUR_HORIZONTAL_16 0
+ pmovzxbw m0, [r1]
+ pmovzxbw m1, [r1 + 1]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 2]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 3]
+ paddw m0, m1
+%endmacro
+
+%macro INTEGRAL_FOUR_HORIZONTAL_4 0
+ movd xm0, [r1]
+ movd xm1, [r1 + 1]
+ pmovzxbw xm0, xm0
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 2]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 3]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+%endmacro
+
+%macro INTEGRAL_FOUR_HORIZONTAL_8_HBD 0
+ pmovzxwd m0, [r1]
+ pmovzxwd m1, [r1 + 2]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 4]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 6]
+ paddd m0, m1
+%endmacro
+
+%macro INTEGRAL_FOUR_HORIZONTAL_4_HBD 0
+ pmovzxwd xm0, [r1]
+ pmovzxwd xm1, [r1 + 2]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 4]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 6]
+ paddd xm0, xm1
+%endmacro
+
+;-----------------------------------------------------------------------------
+;static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+%if HIGH_BIT_DEPTH
+cglobal integral4h, 3, 5, 3
+ lea r3, [4 * r2]
+ sub r0, r3
+ sub r2, 4 ;stride - 4
+ mov r4, r2
+ shr r4, 3
+
+.loop_8:
+ INTEGRAL_FOUR_HORIZONTAL_8_HBD
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ add r1, 16
+ add r0, 32
+ sub r2, 8
+ sub r4, 1
+ jnz .loop_8
+ INTEGRAL_FOUR_HORIZONTAL_4_HBD
+ movu xm1, [r0]
+ paddd xm0, xm1
+ movu [r0 + r3], xm0
+ RET
+
+%else
+cglobal integral4h, 3, 5, 3
+ lea r3, [4 * r2]
+ sub r0, r3
+ sub r2, 4 ;stride - 4
+ mov r4, r2
+ shr r4, 4
+
+.loop_16:
+ INTEGRAL_FOUR_HORIZONTAL_16
+ vperm2i128 m2, m0, m0, 1
+ pmovzxwd m2, xm2
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ movu m1, [r0 + 32]
+ paddd m2, m1
+ movu [r0 + r3 + 32], m2
+ add r1, 16
+ add r0, 64
+ sub r2, 16
+ sub r4, 1
+ jnz .loop_16
+ cmp r2, 12
+ je .loop_12
+ cmp r2, 4
+ je .loop_4
+
+.loop_12:
+ INTEGRAL_FOUR_HORIZONTAL_16
+ vperm2i128 m2, m0, m0, 1
+ pmovzxwd xm2, xm2
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ movu xm1, [r0 + 32]
+ paddd xm2, xm1
+ movu [r0 + r3 + 32], xm2
+ jmp .end
+
+.loop_4:
+ INTEGRAL_FOUR_HORIZONTAL_4
+ pmovzxwd xm0, xm0
+ movu xm1, [r0]
+ paddd xm0, xm1
+ movu [r0 + r3], xm0
+ jmp .end
+
+.end
+ RET
+%endif
+
+%macro INTEGRAL_EIGHT_HORIZONTAL_16 0
+ pmovzxbw m0, [r1]
+ pmovzxbw m1, [r1 + 1]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 2]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 3]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 4]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 5]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 6]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 7]
+ paddw m0, m1
+%endmacro
+
+%macro INTEGRAL_EIGHT_HORIZONTAL_8 0
+ pmovzxbw xm0, [r1]
+ pmovzxbw xm1, [r1 + 1]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 2]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 3]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 4]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 5]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 6]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 7]
+ paddw xm0, xm1
+%endmacro
+
+%macro INTEGRAL_EIGHT_HORIZONTAL_8_HBD 0
+ pmovzxwd m0, [r1]
+ pmovzxwd m1, [r1 + 2]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 4]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 6]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 8]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 10]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 12]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 14]
+ paddd m0, m1
+%endmacro
+
+;-----------------------------------------------------------------------------
+;static void integral_init8h_c(uint32_t *sum, pixel *pix, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+%if HIGH_BIT_DEPTH
+cglobal integral8h, 3, 4, 3
+ lea r3, [4 * r2]
+ sub r0, r3
+ sub r2, 8 ;stride - 8
+
+.loop:
+ INTEGRAL_EIGHT_HORIZONTAL_8_HBD
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ add r1, 16
+ add r0, 32
+ sub r2, 8
+ jnz .loop
+ RET
+
+%else
+cglobal integral8h, 3, 5, 3
+ lea r3, [4 * r2]
+ sub r0, r3
+ sub r2, 8 ;stride - 8
+ mov r4, r2
+ shr r4, 4
+
+.loop_16:
+ INTEGRAL_EIGHT_HORIZONTAL_16
+ vperm2i128 m2, m0, m0, 1
+ pmovzxwd m2, xm2
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ movu m1, [r0 + 32]
+ paddd m2, m1
+ movu [r0 + r3 + 32], m2
+ add r1, 16
+ add r0, 64
+ sub r2, 16
+ sub r4, 1
+ jnz .loop_16
+ cmp r2, 8
+ je .loop_8
+ jmp .end
+
+.loop_8:
+ INTEGRAL_EIGHT_HORIZONTAL_8
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ jmp .end
+
+.end
+ RET
+%endif
+
+%macro INTEGRAL_TWELVE_HORIZONTAL_16 0
+ pmovzxbw m0, [r1]
+ pmovzxbw m1, [r1 + 1]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 2]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 3]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 4]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 5]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 6]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 7]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 8]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 9]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 10]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 11]
+ paddw m0, m1
+%endmacro
+
+%macro INTEGRAL_TWELVE_HORIZONTAL_4 0
+ movd xm0, [r1]
+ movd xm1, [r1 + 1]
+ pmovzxbw xm0, xm0
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 2]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 3]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 4]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 5]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 6]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 7]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 8]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 9]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 10]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+ movd xm1, [r1 + 11]
+ pmovzxbw xm1, xm1
+ paddw xm0, xm1
+%endmacro
+
+%macro INTEGRAL_TWELVE_HORIZONTAL_8_HBD 0
+ pmovzxwd m0, [r1]
+ pmovzxwd m1, [r1 + 2]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 4]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 6]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 8]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 10]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 12]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 14]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 16]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 18]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 20]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 22]
+ paddd m0, m1
+%endmacro
+
+%macro INTEGRAL_TWELVE_HORIZONTAL_4_HBD 0
+ pmovzxwd xm0, [r1]
+ pmovzxwd xm1, [r1 + 2]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 4]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 6]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 8]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 10]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 12]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 14]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 16]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 18]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 20]
+ paddd xm0, xm1
+ pmovzxwd xm1, [r1 + 22]
+ paddd xm0, xm1
+%endmacro
+
+;-----------------------------------------------------------------------------
+;static void integral_init12h_c(uint32_t *sum, pixel *pix, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+%if HIGH_BIT_DEPTH
+cglobal integral12h, 3, 5, 3
+ lea r3, [4 * r2]
+ sub r0, r3
+ sub r2, 12 ;stride - 12
+ mov r4, r2
+ shr r4, 3
+
+.loop:
+ INTEGRAL_TWELVE_HORIZONTAL_8_HBD
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ add r1, 16
+ add r0, 32
+ sub r2, 8
+ sub r4, 1
+ jnz .loop
+ INTEGRAL_TWELVE_HORIZONTAL_4_HBD
+ movu xm1, [r0]
+ paddd xm0, xm1
+ movu [r0 + r3], xm0
+ RET
+
+%else
+cglobal integral12h, 3, 5, 3
+ lea r3, [4 * r2]
+ sub r0, r3
+ sub r2, 12 ;stride - 12
+ mov r4, r2
+ shr r4, 4
+
+.loop_16:
+ INTEGRAL_TWELVE_HORIZONTAL_16
+ vperm2i128 m2, m0, m0, 1
+ pmovzxwd m2, xm2
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ movu m1, [r0 + 32]
+ paddd m2, m1
+ movu [r0 + r3 + 32], m2
+ add r1, 16
+ add r0, 64
+ sub r2, 16
+ sub r4, 1
+ jnz .loop_16
+ cmp r2, 12
+ je .loop_12
+ cmp r2, 4
+ je .loop_4
+
+.loop_12:
+ INTEGRAL_TWELVE_HORIZONTAL_16
+ vperm2i128 m2, m0, m0, 1
+ pmovzxwd xm2, xm2
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ movu xm1, [r0 + 32]
+ paddd xm2, xm1
+ movu [r0 + r3 + 32], xm2
+ jmp .end
+
+.loop_4:
+ INTEGRAL_TWELVE_HORIZONTAL_4
+ pmovzxwd xm0, xm0
+ movu xm1, [r0]
+ paddd xm0, xm1
+ movu [r0 + r3], xm0
+ jmp .end
+
+.end
+ RET
+%endif
+
+%macro INTEGRAL_SIXTEEN_HORIZONTAL_16 0
+ pmovzxbw m0, [r1]
+ pmovzxbw m1, [r1 + 1]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 2]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 3]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 4]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 5]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 6]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 7]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 8]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 9]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 10]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 11]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 12]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 13]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 14]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 15]
+ paddw m0, m1
+%endmacro
+
+%macro INTEGRAL_SIXTEEN_HORIZONTAL_8 0
+ pmovzxbw xm0, [r1]
+ pmovzxbw xm1, [r1 + 1]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 2]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 3]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 4]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 5]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 6]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 7]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 8]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 9]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 10]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 11]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 12]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 13]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 14]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 15]
+ paddw xm0, xm1
+%endmacro
+
+%macro INTEGRAL_SIXTEEN_HORIZONTAL_8_HBD 0
+ pmovzxwd m0, [r1]
+ pmovzxwd m1, [r1 + 2]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 4]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 6]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 8]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 10]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 12]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 14]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 16]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 18]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 20]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 22]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 24]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 26]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 28]
+ paddd m0, m1
+ pmovzxwd m1, [r1 + 30]
+ paddd m0, m1
+%endmacro
+
+;-----------------------------------------------------------------------------
+;static void integral_init16h_c(uint32_t *sum, pixel *pix, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+%if HIGH_BIT_DEPTH
+cglobal integral16h, 3, 4, 3
+ lea r3, [4 * r2]
+ sub r0, r3
+ sub r2, 16 ;stride - 16
+
+.loop:
+ INTEGRAL_SIXTEEN_HORIZONTAL_8_HBD
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ add r1, 16
+ add r0, 32
+ sub r2, 8
+ jnz .loop
+ RET
+
+%else
+cglobal integral16h, 3, 5, 3
+ lea r3, [4 * r2]
+ sub r0, r3
+ sub r2, 16 ;stride - 16
+ mov r4, r2
+ shr r4, 4
+
+.loop_16:
+ INTEGRAL_SIXTEEN_HORIZONTAL_16
+ vperm2i128 m2, m0, m0, 1
+ pmovzxwd m2, xm2
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ movu m1, [r0 + 32]
+ paddd m2, m1
+ movu [r0 + r3 + 32], m2
+ add r1, 16
+ add r0, 64
+ sub r2, 16
+ sub r4, 1
+ jnz .loop_16
+ cmp r2, 8
+ je .loop_8
+ jmp .end
+
+.loop_8:
+ INTEGRAL_SIXTEEN_HORIZONTAL_8
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ jmp .end
+
+.end
+ RET
+%endif
+
+%macro INTEGRAL_TWENTYFOUR_HORIZONTAL_16 0
+ pmovzxbw m0, [r1]
+ pmovzxbw m1, [r1 + 1]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 2]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 3]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 4]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 5]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 6]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 7]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 8]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 9]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 10]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 11]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 12]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 13]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 14]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 15]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 16]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 17]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 18]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 19]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 20]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 21]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 22]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 23]
+ paddw m0, m1
+%endmacro
+
+%macro INTEGRAL_TWENTYFOUR_HORIZONTAL_8 0
+ pmovzxbw xm0, [r1]
+ pmovzxbw xm1, [r1 + 1]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 2]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 3]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 4]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 5]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 6]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 7]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 8]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 9]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 10]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 11]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 12]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 13]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 14]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 15]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 16]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 17]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 18]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 19]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 20]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 21]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 22]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 23]
+ paddw xm0, xm1
+%endmacro
+
+;-----------------------------------------------------------------------------
+;static void integral_init24h_c(uint32_t *sum, pixel *pix, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral24h, 3, 5, 3
+ lea r3, [4 * r2]
+ sub r0, r3
+ sub r2, 24 ;stride - 24
+ mov r4, r2
+ shr r4, 4
+
+.loop_16:
+ INTEGRAL_TWENTYFOUR_HORIZONTAL_16
+ vperm2i128 m2, m0, m0, 1
+ pmovzxwd m2, xm2
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ movu m1, [r0 + 32]
+ paddd m2, m1
+ movu [r0 + r3 + 32], m2
+ add r1, 16
+ add r0, 64
+ sub r2, 16
+ sub r4, 1
+ jnz .loop_16
+ cmp r2, 8
+ je .loop_8
+ jmp .end
+
+.loop_8:
+ INTEGRAL_TWENTYFOUR_HORIZONTAL_8
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ jmp .end
+
+.end
+ RET
+
+%macro INTEGRAL_THIRTYTWO_HORIZONTAL_16 0
+ pmovzxbw m0, [r1]
+ pmovzxbw m1, [r1 + 1]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 2]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 3]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 4]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 5]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 6]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 7]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 8]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 9]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 10]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 11]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 12]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 13]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 14]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 15]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 16]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 17]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 18]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 19]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 20]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 21]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 22]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 23]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 24]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 25]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 26]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 27]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 28]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 29]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 30]
+ paddw m0, m1
+ pmovzxbw m1, [r1 + 31]
+ paddw m0, m1
+%endmacro
+
+
+%macro INTEGRAL_THIRTYTWO_HORIZONTAL_8 0
+ pmovzxbw xm0, [r1]
+ pmovzxbw xm1, [r1 + 1]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 2]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 3]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 4]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 5]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 6]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 7]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 8]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 9]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 10]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 11]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 12]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 13]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 14]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 15]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 16]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 17]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 18]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 19]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 20]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 21]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 22]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 23]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 24]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 25]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 26]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 27]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 28]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 29]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 30]
+ paddw xm0, xm1
+ pmovzxbw xm1, [r1 + 31]
+ paddw xm0, xm1
+%endmacro
+
+;-----------------------------------------------------------------------------
+;static void integral_init32h_c(uint32_t *sum, pixel *pix, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral32h, 3, 5, 3
+ lea r3, [4 * r2]
+ sub r0, r3
+ sub r2, 32 ;stride - 32
+ mov r4, r2
+ shr r4, 4
+
+.loop_16:
+ INTEGRAL_THIRTYTWO_HORIZONTAL_16
+ vperm2i128 m2, m0, m0, 1
+ pmovzxwd m2, xm2
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ movu m1, [r0 + 32]
+ paddd m2, m1
+ movu [r0 + r3 + 32], m2
+ add r1, 16
+ add r0, 64
+ sub r2, 16
+ sub r4, 1
+ jnz .loop_16
+ cmp r2, 8
+ je .loop_8
+ jmp .end
+
+.loop_8:
+ INTEGRAL_THIRTYTWO_HORIZONTAL_8
+ pmovzxwd m0, xm0
+ movu m1, [r0]
+ paddd m0, m1
+ movu [r0 + r3], m0
+ jmp .end
+
+.end
+ RET
diff --git a/source/common/x86/seaintegral.h b/source/common/x86/seaintegral.h
new file mode 100644
index 0000000..dc98dc4
--- /dev/null
+++ b/source/common/x86/seaintegral.h
@@ -0,0 +1,42 @@
+/*****************************************************************************
+* Copyright (C) 2013-2017 MulticoreWare, Inc
+*
+* Authors: Vignesh V Menon <vignesh at multicorewareinc.com>
+* Jayashri Murugan <jayashri at multicorewareinc.com>
+* Praveen Tiwari <praveen at multicorewareinc.com>
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
+*
+* This program is also available under a commercial proprietary license.
+* For more information, contact us at license @ x265.com.
+*****************************************************************************/
+
+#ifndef X265_SEAINTEGRAL_H
+#define X265_SEAINTEGRAL_H
+
+void PFX(integral4v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral8v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral12v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral16v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral24v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral32v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral4h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral8h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral12h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral16h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral24h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral32h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+
+#endif //X265_SEAINTEGRAL_H
diff --git a/source/common/x86/x86inc.asm b/source/common/x86/x86inc.asm
index 0192e76..a7d96dd 100644
--- a/source/common/x86/x86inc.asm
+++ b/source/common/x86/x86inc.asm
@@ -76,10 +76,6 @@
SECTION .rodata align=%1
%endmacro
-%macro SECTION_TEXT 0-1 16
- SECTION .text align=%1
-%endmacro
-
%if WIN64
%define PIC
%elif ARCH_X86_64 == 0
@@ -139,6 +135,7 @@
%define r%1w %2w
%define r%1b %2b
%define r%1h %2h
+ %define %2q %2
%if %0 == 2
%define r%1m %2d
%define r%1mp %2
@@ -163,9 +160,9 @@
%define e%1h %3
%define r%1b %2
%define e%1b %2
-%if ARCH_X86_64 == 0
- %define r%1 e%1
-%endif
+ %if ARCH_X86_64 == 0
+ %define r%1 e%1
+ %endif
%endmacro
DECLARE_REG_SIZE ax, al, ah
@@ -275,7 +272,7 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
%macro ASSERT 1
%if (%1) == 0
- %error assert failed
+ %error assertion ``%1'' failed
%endif
%endmacro
@@ -365,9 +362,19 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
%ifnum %1
%if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
%if %1 > 0
+ ; Reserve an additional register for storing the original stack pointer, but avoid using
+ ; eax/rax for this purpose since it can potentially get overwritten as a return value.
%assign regs_used (regs_used + 1)
- %elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + UNIX64 * 2
- %warning "Stack pointer will overwrite register argument"
+ %if ARCH_X86_64 && regs_used == 7
+ %assign regs_used 8
+ %elif ARCH_X86_64 == 0 && regs_used == 1
+ %assign regs_used 2
+ %endif
+ %endif
+ %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
+ ; Ensure that we don't clobber any registers containing arguments. For UNIX64 we also preserve r6 (rax)
+ ; since it's used as a hidden argument in vararg functions to specify the number of vector registers used.
+ %assign regs_used 5 + UNIX64 * 3
%endif
%endif
%endif
@@ -396,10 +403,10 @@ DECLARE_REG 7, rdi, 64
DECLARE_REG 8, rsi, 72
DECLARE_REG 9, rbx, 80
DECLARE_REG 10, rbp, 88
-DECLARE_REG 11, R12, 96
-DECLARE_REG 12, R13, 104
-DECLARE_REG 13, R14, 112
-DECLARE_REG 14, R15, 120
+DECLARE_REG 11, R14, 96
+DECLARE_REG 12, R15, 104
+DECLARE_REG 13, R12, 112
+DECLARE_REG 14, R13, 120
%macro PROLOGUE 2-5+ 0 ; #args, #regs, #xmm_regs, [stack_size,] arg_names...
%assign num_args %1
@@ -445,45 +452,46 @@ DECLARE_REG 14, R15, 120
WIN64_PUSH_XMM
%endmacro
-%macro WIN64_RESTORE_XMM_INTERNAL 1
+%macro WIN64_RESTORE_XMM_INTERNAL 0
%assign %%pad_size 0
%if xmm_regs_used > 8
%assign %%i xmm_regs_used
%rep xmm_regs_used-8
%assign %%i %%i-1
- movaps xmm %+ %%i, [%1 + (%%i-8)*16 + stack_size + 32]
+ movaps xmm %+ %%i, [rsp + (%%i-8)*16 + stack_size + 32]
%endrep
%endif
%if stack_size_padded > 0
%if stack_size > 0 && required_stack_alignment > STACK_ALIGNMENT
mov rsp, rstkm
%else
- add %1, stack_size_padded
+ add rsp, stack_size_padded
%assign %%pad_size stack_size_padded
%endif
%endif
%if xmm_regs_used > 7
- movaps xmm7, [%1 + stack_offset - %%pad_size + 24]
+ movaps xmm7, [rsp + stack_offset - %%pad_size + 24]
%endif
%if xmm_regs_used > 6
- movaps xmm6, [%1 + stack_offset - %%pad_size + 8]
+ movaps xmm6, [rsp + stack_offset - %%pad_size + 8]
%endif
%endmacro
-%macro WIN64_RESTORE_XMM 1
- WIN64_RESTORE_XMM_INTERNAL %1
+%macro WIN64_RESTORE_XMM 0
+ WIN64_RESTORE_XMM_INTERNAL
%assign stack_offset (stack_offset-stack_size_padded)
+ %assign stack_size_padded 0
%assign xmm_regs_used 0
%endmacro
%define has_epilogue regs_used > 7 || xmm_regs_used > 6 || mmsize == 32 || stack_size > 0
%macro RET 0
- WIN64_RESTORE_XMM_INTERNAL rsp
+ WIN64_RESTORE_XMM_INTERNAL
POP_IF_USED 14, 13, 12, 11, 10, 9, 8, 7
-%if mmsize == 32
- vzeroupper
-%endif
+ %if mmsize == 32
+ vzeroupper
+ %endif
AUTO_REP_RET
%endmacro
@@ -500,10 +508,10 @@ DECLARE_REG 7, R10, 16
DECLARE_REG 8, R11, 24
DECLARE_REG 9, rbx, 32
DECLARE_REG 10, rbp, 40
-DECLARE_REG 11, R12, 48
-DECLARE_REG 12, R13, 56
-DECLARE_REG 13, R14, 64
-DECLARE_REG 14, R15, 72
+DECLARE_REG 11, R14, 48
+DECLARE_REG 12, R15, 56
+DECLARE_REG 13, R12, 64
+DECLARE_REG 14, R13, 72
%macro PROLOGUE 2-5+ ; #args, #regs, #xmm_regs, [stack_size,] arg_names...
%assign num_args %1
@@ -520,17 +528,17 @@ DECLARE_REG 14, R15, 72
%define has_epilogue regs_used > 9 || mmsize == 32 || stack_size > 0
%macro RET 0
-%if stack_size_padded > 0
-%if required_stack_alignment > STACK_ALIGNMENT
- mov rsp, rstkm
-%else
- add rsp, stack_size_padded
-%endif
-%endif
+ %if stack_size_padded > 0
+ %if required_stack_alignment > STACK_ALIGNMENT
+ mov rsp, rstkm
+ %else
+ add rsp, stack_size_padded
+ %endif
+ %endif
POP_IF_USED 14, 13, 12, 11, 10, 9
-%if mmsize == 32
- vzeroupper
-%endif
+ %if mmsize == 32
+ vzeroupper
+ %endif
AUTO_REP_RET
%endmacro
@@ -576,29 +584,29 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14
%define has_epilogue regs_used > 3 || mmsize == 32 || stack_size > 0
%macro RET 0
-%if stack_size_padded > 0
-%if required_stack_alignment > STACK_ALIGNMENT
- mov rsp, rstkm
-%else
- add rsp, stack_size_padded
-%endif
-%endif
+ %if stack_size_padded > 0
+ %if required_stack_alignment > STACK_ALIGNMENT
+ mov rsp, rstkm
+ %else
+ add rsp, stack_size_padded
+ %endif
+ %endif
POP_IF_USED 6, 5, 4, 3
-%if mmsize == 32
- vzeroupper
-%endif
+ %if mmsize == 32
+ vzeroupper
+ %endif
AUTO_REP_RET
%endmacro
%endif ;======================================================================
%if WIN64 == 0
-%macro WIN64_SPILL_XMM 1
-%endmacro
-%macro WIN64_RESTORE_XMM 1
-%endmacro
-%macro WIN64_PUSH_XMM 0
-%endmacro
+ %macro WIN64_SPILL_XMM 1
+ %endmacro
+ %macro WIN64_RESTORE_XMM 0
+ %endmacro
+ %macro WIN64_PUSH_XMM 0
+ %endmacro
%endif
; On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
@@ -615,10 +623,8 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14
%define last_branch_adr $$
%macro AUTO_REP_RET 0
- %ifndef cpuflags
- times ((last_branch_adr-$)>>31)+1 rep ; times 1 iff $ != last_branch_adr.
- %elif notcpuflag(ssse3)
- times ((last_branch_adr-$)>>31)+1 rep
+ %if notcpuflag(ssse3)
+ times ((last_branch_adr-$)>>31)+1 rep ; times 1 iff $ == last_branch_adr.
%endif
ret
%endmacro
@@ -627,8 +633,10 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14
%rep %0
%macro %1 1-2 %1
%2 %1
- %%branch_instr:
- %xdefine last_branch_adr %%branch_instr
+ %if notcpuflag(ssse3)
+ %%branch_instr equ $
+ %xdefine last_branch_adr %%branch_instr
+ %endif
%endmacro
%rotate 1
%endrep
@@ -722,7 +730,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae,
; This is needed for ELF, otherwise the GNU linker assumes the stack is
; executable by default.
%ifidn __OUTPUT_FORMAT__,elf
-SECTION .note.GNU-stack noalloc noexec nowrite progbits
+ [SECTION .note.GNU-stack noalloc noexec nowrite progbits]
%endif
; cpuflags
@@ -734,27 +742,28 @@ SECTION .note.GNU-stack noalloc noexec nowrite progbits
%assign cpuflags_sse (1<<4) | cpuflags_mmx2
%assign cpuflags_sse2 (1<<5) | cpuflags_sse
%assign cpuflags_sse2slow (1<<6) | cpuflags_sse2
-%assign cpuflags_sse3 (1<<7) | cpuflags_sse2
-%assign cpuflags_ssse3 (1<<8) | cpuflags_sse3
-%assign cpuflags_sse4 (1<<9) | cpuflags_ssse3
-%assign cpuflags_sse42 (1<<10)| cpuflags_sse4
-%assign cpuflags_avx (1<<11)| cpuflags_sse42
-%assign cpuflags_xop (1<<12)| cpuflags_avx
-%assign cpuflags_fma4 (1<<13)| cpuflags_avx
-%assign cpuflags_avx2 (1<<14)| cpuflags_avx
+%assign cpuflags_lzcnt (1<<7) | cpuflags_sse2
+%assign cpuflags_sse3 (1<<8) | cpuflags_sse2
+%assign cpuflags_ssse3 (1<<9) | cpuflags_sse3
+%assign cpuflags_sse4 (1<<10)| cpuflags_ssse3
+%assign cpuflags_sse42 (1<<11)| cpuflags_sse4
+%assign cpuflags_avx (1<<12)| cpuflags_sse42
+%assign cpuflags_xop (1<<13)| cpuflags_avx
+%assign cpuflags_fma4 (1<<14)| cpuflags_avx
%assign cpuflags_fma3 (1<<15)| cpuflags_avx
+%assign cpuflags_bmi1 (1<<16)| cpuflags_avx | cpuflags_lzcnt
+%assign cpuflags_bmi2 (1<<17)| cpuflags_bmi1
+%assign cpuflags_avx2 (1<<18)| cpuflags_fma3 | cpuflags_bmi2
-%assign cpuflags_cache32 (1<<16)
-%assign cpuflags_cache64 (1<<17)
-%assign cpuflags_slowctz (1<<18)
-%assign cpuflags_lzcnt (1<<19)
-%assign cpuflags_aligned (1<<20) ; not a cpu feature, but a function variant
-%assign cpuflags_atom (1<<21)
-%assign cpuflags_bmi1 (1<<22)|cpuflags_lzcnt
-%assign cpuflags_bmi2 (1<<23)|cpuflags_bmi1
+%assign cpuflags_cache32 (1<<19)
+%assign cpuflags_cache64 (1<<20)
+%assign cpuflags_slowctz (1<<21)
+%assign cpuflags_aligned (1<<22) ; not a cpu feature, but a function variant
+%assign cpuflags_atom (1<<23)
-%define cpuflag(x) ((cpuflags & (cpuflags_ %+ x)) == (cpuflags_ %+ x))
-%define notcpuflag(x) ((cpuflags & (cpuflags_ %+ x)) != (cpuflags_ %+ x))
+; Returns a boolean value expressing whether or not the specified cpuflag is enabled.
+%define cpuflag(x) (((((cpuflags & (cpuflags_ %+ x)) ^ (cpuflags_ %+ x)) - 1) >> 31) & 1)
+%define notcpuflag(x) (cpuflag(x) ^ 1)
; Takes an arbitrary number of cpuflags from the above list.
; All subsequent functions (up to the next INIT_CPUFLAGS) is built for the specified cpu.
@@ -823,14 +832,14 @@ SECTION .note.GNU-stack noalloc noexec nowrite progbits
%define movnta movntq
%assign %%i 0
%rep 8
- CAT_XDEFINE m, %%i, mm %+ %%i
- CAT_XDEFINE nmm, %%i, %%i
- %assign %%i %%i+1
+ CAT_XDEFINE m, %%i, mm %+ %%i
+ CAT_XDEFINE nnmm, %%i, %%i
+ %assign %%i %%i+1
%endrep
%rep 8
- CAT_UNDEF m, %%i
- CAT_UNDEF nmm, %%i
- %assign %%i %%i+1
+ CAT_UNDEF m, %%i
+ CAT_UNDEF nnmm, %%i
+ %assign %%i %%i+1
%endrep
INIT_CPUFLAGS %1
%endmacro
@@ -841,7 +850,7 @@ SECTION .note.GNU-stack noalloc noexec nowrite progbits
%define mmsize 16
%define num_mmregs 8
%if ARCH_X86_64
- %define num_mmregs 16
+ %define num_mmregs 16
%endif
%define mova movdqa
%define movu movdqu
@@ -849,9 +858,9 @@ SECTION .note.GNU-stack noalloc noexec nowrite progbits
%define movnta movntdq
%assign %%i 0
%rep num_mmregs
- CAT_XDEFINE m, %%i, xmm %+ %%i
- CAT_XDEFINE nxmm, %%i, %%i
- %assign %%i %%i+1
+ CAT_XDEFINE m, %%i, xmm %+ %%i
+ CAT_XDEFINE nnxmm, %%i, %%i
+ %assign %%i %%i+1
%endrep
INIT_CPUFLAGS %1
%endmacro
@@ -862,7 +871,7 @@ SECTION .note.GNU-stack noalloc noexec nowrite progbits
%define mmsize 32
%define num_mmregs 8
%if ARCH_X86_64
- %define num_mmregs 16
+ %define num_mmregs 16
%endif
%define mova movdqa
%define movu movdqu
@@ -870,9 +879,9 @@ SECTION .note.GNU-stack noalloc noexec nowrite progbits
%define movnta movntdq
%assign %%i 0
%rep num_mmregs
- CAT_XDEFINE m, %%i, ymm %+ %%i
- CAT_XDEFINE nymm, %%i, %%i
- %assign %%i %%i+1
+ CAT_XDEFINE m, %%i, ymm %+ %%i
+ CAT_XDEFINE nnymm, %%i, %%i
+ %assign %%i %%i+1
%endrep
INIT_CPUFLAGS %1
%endmacro
@@ -889,8 +898,6 @@ INIT_XMM
%define ymmmm%1 mm%1
%define ymmxmm%1 xmm%1
%define ymmymm%1 ymm%1
- %define ymm%1xmm xmm%1
- %define xmm%1ymm ymm%1
%define xm%1 xmm %+ m%1
%define ym%1 ymm %+ m%1
%endmacro
@@ -898,7 +905,7 @@ INIT_XMM
%assign i 0
%rep 16
DECLARE_MMCAST i
-%assign i i+1
+ %assign i i+1
%endrep
; I often want to use macros that permute their arguments. e.g. there's no
@@ -916,23 +923,23 @@ INIT_XMM
; doesn't cost any cycles.
%macro PERMUTE 2-* ; takes a list of pairs to swap
-%rep %0/2
- %xdefine %%tmp%2 m%2
- %rotate 2
-%endrep
-%rep %0/2
- %xdefine m%1 %%tmp%2
- CAT_XDEFINE n, m%1, %1
- %rotate 2
-%endrep
+ %rep %0/2
+ %xdefine %%tmp%2 m%2
+ %rotate 2
+ %endrep
+ %rep %0/2
+ %xdefine m%1 %%tmp%2
+ CAT_XDEFINE nn, m%1, %1
+ %rotate 2
+ %endrep
%endmacro
%macro SWAP 2+ ; swaps a single chain (sometimes more concise than pairs)
-%ifnum %1 ; SWAP 0, 1, ...
- SWAP_INTERNAL_NUM %1, %2
-%else ; SWAP m0, m1, ...
- SWAP_INTERNAL_NAME %1, %2
-%endif
+ %ifnum %1 ; SWAP 0, 1, ...
+ SWAP_INTERNAL_NUM %1, %2
+ %else ; SWAP m0, m1, ...
+ SWAP_INTERNAL_NAME %1, %2
+ %endif
%endmacro
%macro SWAP_INTERNAL_NUM 2-*
@@ -940,17 +947,17 @@ INIT_XMM
%xdefine %%tmp m%1
%xdefine m%1 m%2
%xdefine m%2 %%tmp
- CAT_XDEFINE n, m%1, %1
- CAT_XDEFINE n, m%2, %2
- %rotate 1
+ CAT_XDEFINE nn, m%1, %1
+ CAT_XDEFINE nn, m%2, %2
+ %rotate 1
%endrep
%endmacro
%macro SWAP_INTERNAL_NAME 2-*
- %xdefine %%args n %+ %1
+ %xdefine %%args nn %+ %1
%rep %0-1
- %xdefine %%args %%args, n %+ %2
- %rotate 1
+ %xdefine %%args %%args, nn %+ %2
+ %rotate 1
%endrep
SWAP_INTERNAL_NUM %%args
%endmacro
@@ -967,7 +974,7 @@ INIT_XMM
%assign %%i 0
%rep num_mmregs
CAT_XDEFINE %%f, %%i, m %+ %%i
- %assign %%i %%i+1
+ %assign %%i %%i+1
%endrep
%endmacro
@@ -976,21 +983,25 @@ INIT_XMM
%assign %%i 0
%rep num_mmregs
CAT_XDEFINE m, %%i, %1_m %+ %%i
- CAT_XDEFINE n, m %+ %%i, %%i
- %assign %%i %%i+1
+ CAT_XDEFINE nn, m %+ %%i, %%i
+ %assign %%i %%i+1
%endrep
%endif
%endmacro
; Append cpuflags to the callee's name iff the appended name is known and the plain name isn't
%macro call 1
- call_internal %1, %1 %+ SUFFIX
+ %ifid %1
+ call_internal %1 %+ SUFFIX, %1
+ %else
+ call %1
+ %endif
%endmacro
%macro call_internal 2
- %xdefine %%i %1
- %ifndef cglobaled_%1
- %ifdef cglobaled_%2
- %xdefine %%i %2
+ %xdefine %%i %2
+ %ifndef cglobaled_%2
+ %ifdef cglobaled_%1
+ %xdefine %%i %1
%endif
%endif
call %%i
@@ -1033,7 +1044,7 @@ INIT_XMM
%endif
CAT_XDEFINE sizeofxmm, i, 16
CAT_XDEFINE sizeofymm, i, 32
-%assign i i+1
+ %assign i i+1
%endrep
%undef i
@@ -1051,7 +1062,7 @@ INIT_XMM
;%1 == instruction
;%2 == minimal instruction set
;%3 == 1 if float, 0 if int
-;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
+;%4 == 1 if 4-operand emulation, 0 if 3-operand emulation, 255 otherwise (no emulation)
;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
;%6+: operands
%macro RUN_AVX_INSTR 6-9+
@@ -1075,6 +1086,8 @@ INIT_XMM
%ifdef cpuname
%if notcpuflag(%2)
%error use of ``%1'' %2 instruction in cpuname function: current_function
+ %elif cpuflags_%2 < cpuflags_sse && notcpuflag(sse2) && __sizeofreg > 8
+ %error use of ``%1'' sse2 instruction in cpuname function: current_function
%endif
%endif
%endif
@@ -1082,14 +1095,12 @@ INIT_XMM
%if __emulate_avx
%xdefine __src1 %7
%xdefine __src2 %8
- %ifnidn %6, %7
- %if %0 >= 9
- CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, %8, %9
- %else
- CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, %8
- %endif
- %if %5 && %4 == 0
- %ifnid %8
+ %if %5 && %4 == 0
+ %ifnidn %6, %7
+ %ifidn %6, %8
+ %xdefine __src1 %8
+ %xdefine __src2 %7
+ %elifnnum sizeof%8
; 3-operand AVX instructions with a memory arg can only have it in src2,
; whereas SSE emulation prefers to have it in src1 (i.e. the mov).
; So, if the instruction is commutative with a memory arg, swap them.
@@ -1097,6 +1108,13 @@ INIT_XMM
%xdefine __src2 %7
%endif
%endif
+ %endif
+ %ifnidn %6, __src1
+ %if %0 >= 9
+ CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, __src2, %9
+ %else
+ CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, __src2
+ %endif
%if __sizeofreg == 8
MOVQ %6, __src1
%elif %3
@@ -1124,9 +1142,9 @@ INIT_XMM
;%1 == instruction
;%2 == minimal instruction set
;%3 == 1 if float, 0 if int
-;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
+;%4 == 1 if 4-operand emulation, 0 if 3-operand emulation, 255 otherwise (no emulation)
;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
-%macro AVX_INSTR 1-5 fnord, 0, 1, 0
+%macro AVX_INSTR 1-5 fnord, 0, 255, 0
%macro %1 1-10 fnord, fnord, fnord, fnord, %1, %2, %3, %4, %5
%ifidn %2, fnord
RUN_AVX_INSTR %6, %7, %8, %9, %10, %1
@@ -1146,8 +1164,8 @@ INIT_XMM
; Non-destructive instructions are written without parameters
AVX_INSTR addpd, sse2, 1, 0, 1
AVX_INSTR addps, sse, 1, 0, 1
-AVX_INSTR addsd, sse2, 1, 0, 1
-AVX_INSTR addss, sse, 1, 0, 1
+AVX_INSTR addsd, sse2, 1, 0, 0
+AVX_INSTR addss, sse, 1, 0, 0
AVX_INSTR addsubpd, sse3, 1, 0, 0
AVX_INSTR addsubps, sse3, 1, 0, 0
AVX_INSTR aesdec, fnord, 0, 0, 0
@@ -1160,10 +1178,10 @@ AVX_INSTR andnpd, sse2, 1, 0, 0
AVX_INSTR andnps, sse, 1, 0, 0
AVX_INSTR andpd, sse2, 1, 0, 1
AVX_INSTR andps, sse, 1, 0, 1
-AVX_INSTR blendpd, sse4, 1, 0, 0
-AVX_INSTR blendps, sse4, 1, 0, 0
-AVX_INSTR blendvpd, sse4, 1, 0, 0
-AVX_INSTR blendvps, sse4, 1, 0, 0
+AVX_INSTR blendpd, sse4, 1, 1, 0
+AVX_INSTR blendps, sse4, 1, 1, 0
+AVX_INSTR blendvpd, sse4 ; can't be emulated
+AVX_INSTR blendvps, sse4 ; can't be emulated
AVX_INSTR cmppd, sse2, 1, 1, 0
AVX_INSTR cmpps, sse, 1, 1, 0
AVX_INSTR cmpsd, sse2, 1, 1, 0
@@ -1177,10 +1195,10 @@ AVX_INSTR cvtpd2ps, sse2
AVX_INSTR cvtps2dq, sse2
AVX_INSTR cvtps2pd, sse2
AVX_INSTR cvtsd2si, sse2
-AVX_INSTR cvtsd2ss, sse2
-AVX_INSTR cvtsi2sd, sse2
-AVX_INSTR cvtsi2ss, sse
-AVX_INSTR cvtss2sd, sse2
+AVX_INSTR cvtsd2ss, sse2, 1, 0, 0
+AVX_INSTR cvtsi2sd, sse2, 1, 0, 0
+AVX_INSTR cvtsi2ss, sse, 1, 0, 0
+AVX_INSTR cvtss2sd, sse2, 1, 0, 0
AVX_INSTR cvtss2si, sse
AVX_INSTR cvttpd2dq, sse2
AVX_INSTR cvttps2dq, sse2
@@ -1203,15 +1221,15 @@ AVX_INSTR ldmxcsr, sse
AVX_INSTR maskmovdqu, sse2
AVX_INSTR maxpd, sse2, 1, 0, 1
AVX_INSTR maxps, sse, 1, 0, 1
-AVX_INSTR maxsd, sse2, 1, 0, 1
-AVX_INSTR maxss, sse, 1, 0, 1
+AVX_INSTR maxsd, sse2, 1, 0, 0
+AVX_INSTR maxss, sse, 1, 0, 0
AVX_INSTR minpd, sse2, 1, 0, 1
AVX_INSTR minps, sse, 1, 0, 1
-AVX_INSTR minsd, sse2, 1, 0, 1
-AVX_INSTR minss, sse, 1, 0, 1
+AVX_INSTR minsd, sse2, 1, 0, 0
+AVX_INSTR minss, sse, 1, 0, 0
AVX_INSTR movapd, sse2
AVX_INSTR movaps, sse
-AVX_INSTR movd
+AVX_INSTR movd, mmx
AVX_INSTR movddup, sse3
AVX_INSTR movdqa, sse2
AVX_INSTR movdqu, sse2
@@ -1227,18 +1245,18 @@ AVX_INSTR movntdq, sse2
AVX_INSTR movntdqa, sse4
AVX_INSTR movntpd, sse2
AVX_INSTR movntps, sse
-AVX_INSTR movq
+AVX_INSTR movq, mmx
AVX_INSTR movsd, sse2, 1, 0, 0
AVX_INSTR movshdup, sse3
AVX_INSTR movsldup, sse3
AVX_INSTR movss, sse, 1, 0, 0
AVX_INSTR movupd, sse2
AVX_INSTR movups, sse
-AVX_INSTR mpsadbw, sse4
+AVX_INSTR mpsadbw, sse4, 0, 1, 0
AVX_INSTR mulpd, sse2, 1, 0, 1
AVX_INSTR mulps, sse, 1, 0, 1
-AVX_INSTR mulsd, sse2, 1, 0, 1
-AVX_INSTR mulss, sse, 1, 0, 1
+AVX_INSTR mulsd, sse2, 1, 0, 0
+AVX_INSTR mulss, sse, 1, 0, 0
AVX_INSTR orpd, sse2, 1, 0, 1
AVX_INSTR orps, sse, 1, 0, 1
AVX_INSTR pabsb, ssse3
@@ -1256,14 +1274,18 @@ AVX_INSTR paddsb, mmx, 0, 0, 1
AVX_INSTR paddsw, mmx, 0, 0, 1
AVX_INSTR paddusb, mmx, 0, 0, 1
AVX_INSTR paddusw, mmx, 0, 0, 1
-AVX_INSTR palignr, ssse3
+AVX_INSTR palignr, ssse3, 0, 1, 0
AVX_INSTR pand, mmx, 0, 0, 1
AVX_INSTR pandn, mmx, 0, 0, 0
AVX_INSTR pavgb, mmx2, 0, 0, 1
AVX_INSTR pavgw, mmx2, 0, 0, 1
-AVX_INSTR pblendvb, sse4, 0, 0, 0
-AVX_INSTR pblendw, sse4
-AVX_INSTR pclmulqdq
+AVX_INSTR pblendvb, sse4 ; can't be emulated
+AVX_INSTR pblendw, sse4, 0, 1, 0
+AVX_INSTR pclmulqdq, fnord, 0, 1, 0
+AVX_INSTR pclmulhqhqdq, fnord, 0, 0, 0
+AVX_INSTR pclmulhqlqdq, fnord, 0, 0, 0
+AVX_INSTR pclmullqhqdq, fnord, 0, 0, 0
+AVX_INSTR pclmullqlqdq, fnord, 0, 0, 0
AVX_INSTR pcmpestri, sse42
AVX_INSTR pcmpestrm, sse42
AVX_INSTR pcmpistri, sse42
@@ -1287,10 +1309,10 @@ AVX_INSTR phminposuw, sse4
AVX_INSTR phsubw, ssse3, 0, 0, 0
AVX_INSTR phsubd, ssse3, 0, 0, 0
AVX_INSTR phsubsw, ssse3, 0, 0, 0
-AVX_INSTR pinsrb, sse4
-AVX_INSTR pinsrd, sse4
-AVX_INSTR pinsrq, sse4
-AVX_INSTR pinsrw, mmx2
+AVX_INSTR pinsrb, sse4, 0, 1, 0
+AVX_INSTR pinsrd, sse4, 0, 1, 0
+AVX_INSTR pinsrq, sse4, 0, 1, 0
+AVX_INSTR pinsrw, mmx2, 0, 1, 0
AVX_INSTR pmaddwd, mmx, 0, 0, 1
AVX_INSTR pmaddubsw, ssse3, 0, 0, 0
AVX_INSTR pmaxsb, sse4, 0, 0, 1
@@ -1362,18 +1384,18 @@ AVX_INSTR punpcklwd, mmx, 0, 0, 0
AVX_INSTR punpckldq, mmx, 0, 0, 0
AVX_INSTR punpcklqdq, sse2, 0, 0, 0
AVX_INSTR pxor, mmx, 0, 0, 1
-AVX_INSTR rcpps, sse, 1, 0, 0
+AVX_INSTR rcpps, sse
AVX_INSTR rcpss, sse, 1, 0, 0
AVX_INSTR roundpd, sse4
AVX_INSTR roundps, sse4
-AVX_INSTR roundsd, sse4
-AVX_INSTR roundss, sse4
-AVX_INSTR rsqrtps, sse, 1, 0, 0
+AVX_INSTR roundsd, sse4, 1, 1, 0
+AVX_INSTR roundss, sse4, 1, 1, 0
+AVX_INSTR rsqrtps, sse
AVX_INSTR rsqrtss, sse, 1, 0, 0
AVX_INSTR shufpd, sse2, 1, 1, 0
AVX_INSTR shufps, sse, 1, 1, 0
-AVX_INSTR sqrtpd, sse2, 1, 0, 0
-AVX_INSTR sqrtps, sse, 1, 0, 0
+AVX_INSTR sqrtpd, sse2
+AVX_INSTR sqrtps, sse
AVX_INSTR sqrtsd, sse2, 1, 0, 0
AVX_INSTR sqrtss, sse, 1, 0, 0
AVX_INSTR stmxcsr, sse
@@ -1408,7 +1430,7 @@ AVX_INSTR pfmul, 3dnow, 1, 0, 1
%else
CAT_XDEFINE q, j, i
%endif
-%assign i i+1
+ %assign i i+1
%endrep
%undef i
%undef j
@@ -1431,55 +1453,52 @@ FMA_INSTR pmacsdd, pmulld, paddd ; sse4 emulation
FMA_INSTR pmacsdql, pmuldq, paddq ; sse4 emulation
FMA_INSTR pmadcswd, pmaddwd, paddd
-; convert FMA4 to FMA3 if possible
-%macro FMA4_INSTR 4
- %macro %1 4-8 %1, %2, %3, %4
- %if cpuflag(fma4)
- v%5 %1, %2, %3, %4
- %elifidn %1, %2
- v%6 %1, %4, %3 ; %1 = %1 * %3 + %4
- %elifidn %1, %3
- v%7 %1, %2, %4 ; %1 = %2 * %1 + %4
- %elifidn %1, %4
- v%8 %1, %2, %3 ; %1 = %2 * %3 + %1
+; Macros for consolidating FMA3 and FMA4 using 4-operand (dst, src1, src2, src3) syntax.
+; FMA3 is only possible if dst is the same as one of the src registers.
+; Either src2 or src3 can be a memory operand.
+%macro FMA4_INSTR 2-*
+ %push fma4_instr
+ %xdefine %$prefix %1
+ %rep %0 - 1
+ %macro %$prefix%2 4-6 %$prefix, %2
+ %if notcpuflag(fma3) && notcpuflag(fma4)
+ %error use of ``%5%6'' fma instruction in cpuname function: current_function
+ %elif cpuflag(fma4)
+ v%5%6 %1, %2, %3, %4
+ %elifidn %1, %2
+ ; If %3 or %4 is a memory operand it needs to be encoded as the last operand.
+ %ifid %3
+ v%{5}213%6 %2, %3, %4
+ %else
+ v%{5}132%6 %2, %4, %3
+ %endif
+ %elifidn %1, %3
+ v%{5}213%6 %3, %2, %4
+ %elifidn %1, %4
+ v%{5}231%6 %4, %2, %3
+ %else
+ %error fma3 emulation of ``%5%6 %1, %2, %3, %4'' is not supported
+ %endif
+ %endmacro
+ %rotate 1
+ %endrep
+ %pop
+%endmacro
+
+FMA4_INSTR fmadd, pd, ps, sd, ss
+FMA4_INSTR fmaddsub, pd, ps
+FMA4_INSTR fmsub, pd, ps, sd, ss
+FMA4_INSTR fmsubadd, pd, ps
+FMA4_INSTR fnmadd, pd, ps, sd, ss
+FMA4_INSTR fnmsub, pd, ps, sd, ss
+
+; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug (fixed in 1.3.0)
+%if __YASM_VERSION_ID__ < 0x01030000 && ARCH_X86_64 == 0
+ %macro vpbroadcastq 2
+ %if sizeof%1 == 16
+ movddup %1, %2
%else
- %error fma3 emulation of ``%5 %1, %2, %3, %4'' is not supported
+ vbroadcastsd %1, %2
%endif
%endmacro
-%endmacro
-
-FMA4_INSTR fmaddpd, fmadd132pd, fmadd213pd, fmadd231pd
-FMA4_INSTR fmaddps, fmadd132ps, fmadd213ps, fmadd231ps
-FMA4_INSTR fmaddsd, fmadd132sd, fmadd213sd, fmadd231sd
-FMA4_INSTR fmaddss, fmadd132ss, fmadd213ss, fmadd231ss
-
-FMA4_INSTR fmaddsubpd, fmaddsub132pd, fmaddsub213pd, fmaddsub231pd
-FMA4_INSTR fmaddsubps, fmaddsub132ps, fmaddsub213ps, fmaddsub231ps
-FMA4_INSTR fmsubaddpd, fmsubadd132pd, fmsubadd213pd, fmsubadd231pd
-FMA4_INSTR fmsubaddps, fmsubadd132ps, fmsubadd213ps, fmsubadd231ps
-
-FMA4_INSTR fmsubpd, fmsub132pd, fmsub213pd, fmsub231pd
-FMA4_INSTR fmsubps, fmsub132ps, fmsub213ps, fmsub231ps
-FMA4_INSTR fmsubsd, fmsub132sd, fmsub213sd, fmsub231sd
-FMA4_INSTR fmsubss, fmsub132ss, fmsub213ss, fmsub231ss
-
-FMA4_INSTR fnmaddpd, fnmadd132pd, fnmadd213pd, fnmadd231pd
-FMA4_INSTR fnmaddps, fnmadd132ps, fnmadd213ps, fnmadd231ps
-FMA4_INSTR fnmaddsd, fnmadd132sd, fnmadd213sd, fnmadd231sd
-FMA4_INSTR fnmaddss, fnmadd132ss, fnmadd213ss, fnmadd231ss
-
-FMA4_INSTR fnmsubpd, fnmsub132pd, fnmsub213pd, fnmsub231pd
-FMA4_INSTR fnmsubps, fnmsub132ps, fnmsub213ps, fnmsub231ps
-FMA4_INSTR fnmsubsd, fnmsub132sd, fnmsub213sd, fnmsub231sd
-FMA4_INSTR fnmsubss, fnmsub132ss, fnmsub213ss, fnmsub231ss
-
-; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug
-%if ARCH_X86_64 == 0
-%macro vpbroadcastq 2
-%if sizeof%1 == 16
- movddup %1, %2
-%else
- vbroadcastsd %1, %2
-%endif
-%endmacro
%endif
diff --git a/source/dynamicHDR10/BasicStructures.cpp b/source/dynamicHDR10/BasicStructures.cpp
deleted file mode 100644
index 31a074f..0000000
--- a/source/dynamicHDR10/BasicStructures.cpp
+++ /dev/null
@@ -1,40 +0,0 @@
-/**
- * @file BasicStructures.cpp
- * @brief Defines the structure of metadata parameters
- * @author Daniel Maximiliano Valenzuela, Seongnam Oh.
- * @create date 03/01/2017
- * @version 0.0.1
- *
- * Copyright @ 2017 Samsung Electronics, DMS Lab, Samsung Research America and Samsung Research Tijuana
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version 2
- * of the License, or (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
- * MA 02110-1301, USA.
-**/
-
-#include "BasicStructures.h"
-#include "vector"
-
-struct PercentileLuminance{
-
- float averageLuminance = 0.0;
- float maxRLuminance = 0.0;
- float maxGLuminance = 0.0;
- float maxBLuminance = 0.0;
- int order;
- std::vector<unsigned int> percentiles;
-};
-
-
-
diff --git a/source/dynamicHDR10/BasicStructures.h b/source/dynamicHDR10/BasicStructures.h
index e2451a9..c139b22 100644
--- a/source/dynamicHDR10/BasicStructures.h
+++ b/source/dynamicHDR10/BasicStructures.h
@@ -35,16 +35,26 @@ struct LuminanceParameters
float maxRLuminance = 0.0;
float maxGLuminance = 0.0;
float maxBLuminance = 0.0;
- int order;
+ int order = 0;
std::vector<unsigned int> percentiles;
};
struct BezierCurveData
{
- int order;
- int sPx;
- int sPy;
+ int order = 0;
+ int sPx = 0;
+ int sPy = 0;
std::vector<int> coeff;
};
+struct PercentileLuminance{
+
+ float averageLuminance = 0.0;
+ float maxRLuminance = 0.0;
+ float maxGLuminance = 0.0;
+ float maxBLuminance = 0.0;
+ int order = 0;
+ std::vector<unsigned int> percentiles;
+};
+
#endif // BASICSTRUCTURES_H
diff --git a/source/dynamicHDR10/CMakeLists.txt b/source/dynamicHDR10/CMakeLists.txt
index 5e6eef2..22fb79d 100644
--- a/source/dynamicHDR10/CMakeLists.txt
+++ b/source/dynamicHDR10/CMakeLists.txt
@@ -1,8 +1,8 @@
# vim: syntax=cmake
-if(ENABLE_DYNAMIC_HDR10)
+if(ENABLE_HDR10_PLUS)
add_library(dynamicHDR10 OBJECT
- BasicStructures.cpp BasicStructures.h
+ BasicStructures.h
json11/json11.cpp json11/json11.h
JsonHelper.cpp JsonHelper.h
metadataFromJson.cpp metadataFromJson.h
@@ -10,7 +10,6 @@ add_library(dynamicHDR10 OBJECT
hdr10plus.h
api.cpp )
-else()
cmake_minimum_required (VERSION 2.8.11)
project(dynamicHDR10)
include(CheckIncludeFiles)
@@ -150,26 +149,5 @@ set(BIN_INSTALL_DIR bin CACHE STRING "Install location of executables")
option(ENABLE_SHARED "Build shared library" OFF)
-if(ENABLE_SHARED)
- add_library(dynamicHDR10 SHARED
- json11/json11.cpp json11/json11.h
- BasicStructures.cpp BasicStructures.h
- JsonHelper.cpp JsonHelper.h
- metadataFromJson.cpp metadataFromJson.h
- SeiMetadataDictionary.cpp SeiMetadataDictionary.h
- hdr10plus.h api.cpp )
-else()
- add_library(dynamicHDR10 STATIC
- json11/json11.cpp json11/json11.h
- BasicStructures.cpp BasicStructures.h
- JsonHelper.cpp JsonHelper.h
- metadataFromJson.cpp metadataFromJson.h
- SeiMetadataDictionary.cpp SeiMetadataDictionary.h
- hdr10plus.h api.cpp )
-endif()
-
-install (TARGETS dynamicHDR10
- LIBRARY DESTINATION ${LIB_INSTALL_DIR}
- ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
install(FILES hdr10plus.h DESTINATION include)
endif()
\ No newline at end of file
diff --git a/source/dynamicHDR10/json11/json11.cpp b/source/dynamicHDR10/json11/json11.cpp
index 9cbb2d1..3031fa9 100644
--- a/source/dynamicHDR10/json11/json11.cpp
+++ b/source/dynamicHDR10/json11/json11.cpp
@@ -26,6 +26,12 @@
#include <cstdio>
#include <limits>
+#if _MSC_VER
+#pragma warning(disable: 4510) //const member cannot be default initialized
+#pragma warning(disable: 4512) //assignment operator could not be generated
+#pragma warning(disable: 4610) //const member cannot be default initialized
+#endif
+
namespace json11 {
static const int max_depth = 200;
@@ -435,7 +441,7 @@ struct JsonParser final {
char get_next_token() {
consume_garbage();
if (i == str.size())
- return fail("unexpected end of input", 0);
+ return fail("unexpected end of input", '0');
return str[i++];
}
@@ -472,7 +478,7 @@ struct JsonParser final {
string parse_string() {
string out;
long last_escaped_codepoint = -1;
- while (true) {
+ for (;;) {
if (i == str.size())
return fail("unexpected end of input in string", "");
@@ -665,7 +671,7 @@ struct JsonParser final {
if (ch == '}')
return data;
- while (1) {
+ for (;;) {
if (ch != '"')
return fail("expected '\"' in object, got " + esc(ch));
@@ -698,7 +704,7 @@ struct JsonParser final {
if (ch == ']')
return data;
- while (1) {
+ for (;;) {
i--;
data.push_back(parse_json(depth + 1));
if (failed)
diff --git a/source/dynamicHDR10/metadataFromJson.cpp b/source/dynamicHDR10/metadataFromJson.cpp
index 9a2a437..f33067a 100644
--- a/source/dynamicHDR10/metadataFromJson.cpp
+++ b/source/dynamicHDR10/metadataFromJson.cpp
@@ -168,7 +168,7 @@ public:
{
int payloadBytes = 1;
- for(;payload > 0xFF; payload -= 0xFF, ++payloadBytes);
+ for(;payload >= 0xFF; payload -= 0xFF, ++payloadBytes);
if(payloadBytes > 1)
{
diff --git a/source/encoder/CMakeLists.txt b/source/encoder/CMakeLists.txt
index d91af8d..0b079ae 100644
--- a/source/encoder/CMakeLists.txt
+++ b/source/encoder/CMakeLists.txt
@@ -43,4 +43,5 @@ add_library(encoder OBJECT ../x265.h
reference.cpp reference.h
encoder.cpp encoder.h
api.cpp
- weightPrediction.cpp)
+ weightPrediction.cpp
+ ../x265-extras.cpp ../x265-extras.h)
diff --git a/source/encoder/analysis.cpp b/source/encoder/analysis.cpp
index 858a84d..5dabe33 100644
--- a/source/encoder/analysis.cpp
+++ b/source/encoder/analysis.cpp
@@ -75,6 +75,7 @@ Analysis::Analysis()
m_reuseInterDataCTU = NULL;
m_reuseRef = NULL;
m_bHD = false;
+ m_evaluateInter = 0;
}
bool Analysis::create(ThreadLocalData *tld)
@@ -89,19 +90,19 @@ bool Analysis::create(ThreadLocalData *tld)
cacheCost = X265_MALLOC(uint64_t, costArrSize);
int csp = m_param->internalCsp;
- uint32_t cuSize = g_maxCUSize;
+ uint32_t cuSize = m_param->maxCUSize;
bool ok = true;
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++, cuSize >>= 1)
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++, cuSize >>= 1)
{
ModeDepth &md = m_modeDepth[depth];
- md.cuMemPool.create(depth, csp, MAX_PRED_TYPES);
+ md.cuMemPool.create(depth, csp, MAX_PRED_TYPES, *m_param);
ok &= md.fencYuv.create(cuSize, csp);
for (int j = 0; j < MAX_PRED_TYPES; j++)
{
- md.pred[j].cu.initialize(md.cuMemPool, depth, csp, j);
+ md.pred[j].cu.initialize(md.cuMemPool, depth, *m_param, j);
ok &= md.pred[j].predYuv.create(cuSize, csp);
ok &= md.pred[j].reconYuv.create(cuSize, csp);
md.pred[j].fencYuv = &md.fencYuv;
@@ -115,7 +116,7 @@ bool Analysis::create(ThreadLocalData *tld)
void Analysis::destroy()
{
- for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+ for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
{
m_modeDepth[i].cuMemPool.destroy();
m_modeDepth[i].fencYuv.destroy();
@@ -150,6 +151,41 @@ Mode& Analysis::compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, con
calculateNormFactor(ctu, qp);
uint32_t numPartition = ctu.m_numPartitions;
+ if (m_param->bCTUInfo && (*m_frame->m_ctuInfo + ctu.m_cuAddr))
+ {
+ x265_ctu_info_t* ctuTemp = *m_frame->m_ctuInfo + ctu.m_cuAddr;
+ if (ctuTemp->ctuPartitions)
+ {
+ int32_t depthIdx = 0;
+ uint32_t maxNum8x8Partitions = 64;
+ uint8_t* depthInfoPtr = m_frame->m_addOnDepth[ctu.m_cuAddr];
+ uint8_t* contentInfoPtr = m_frame->m_addOnCtuInfo[ctu.m_cuAddr];
+ int* prevCtuInfoChangePtr = m_frame->m_addOnPrevChange[ctu.m_cuAddr];
+ do
+ {
+ uint8_t depth = (uint8_t)ctuTemp->ctuPartitions[depthIdx];
+ uint8_t content = (uint8_t)(*((int32_t *)ctuTemp->ctuInfo + depthIdx));
+ int prevCtuInfoChange = m_frame->m_prevCtuInfoChange[ctu.m_cuAddr * maxNum8x8Partitions + depthIdx];
+ memset(depthInfoPtr, depth, sizeof(uint8_t) * numPartition >> 2 * depth);
+ memset(contentInfoPtr, content, sizeof(uint8_t) * numPartition >> 2 * depth);
+ memset(prevCtuInfoChangePtr, 0, sizeof(int) * numPartition >> 2 * depth);
+ for (uint32_t l = 0; l < numPartition >> 2 * depth; l++)
+ prevCtuInfoChangePtr[l] = prevCtuInfoChange;
+ depthInfoPtr += ctu.m_numPartitions >> 2 * depth;
+ contentInfoPtr += ctu.m_numPartitions >> 2 * depth;
+ prevCtuInfoChangePtr += ctu.m_numPartitions >> 2 * depth;
+ depthIdx++;
+ } while (ctuTemp->ctuPartitions[depthIdx] != 0);
+
+ m_additionalCtuInfo = m_frame->m_addOnCtuInfo[ctu.m_cuAddr];
+ m_prevCtuInfoChange = m_frame->m_addOnPrevChange[ctu.m_cuAddr];
+ memcpy(ctu.m_cuDepth, m_frame->m_addOnDepth[ctu.m_cuAddr], sizeof(uint8_t) * numPartition);
+ //Calculate log2CUSize from depth
+ for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
+ ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i];
+ }
+ }
+
if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead)
{
m_multipassAnalysis = (analysis2PassFrameData*)m_frame->m_analysis2Pass.analysisFramedata;
@@ -167,19 +203,19 @@ Mode& Analysis::compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, con
}
}
- if (m_param->analysisMode && m_slice->m_sliceType != I_SLICE && m_param->analysisRefineLevel > 1 && m_param->analysisRefineLevel < 10)
+ if (m_param->analysisReuseMode && m_slice->m_sliceType != I_SLICE && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel < 10)
{
int numPredDir = m_slice->isInterP() ? 1 : 2;
m_reuseInterDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
m_reuseRef = &m_reuseInterDataCTU->ref[ctu.m_cuAddr * X265_MAX_PRED_MODE_PER_CTU * numPredDir];
m_reuseDepth = &m_reuseInterDataCTU->depth[ctu.m_cuAddr * ctu.m_numPartitions];
m_reuseModes = &m_reuseInterDataCTU->modes[ctu.m_cuAddr * ctu.m_numPartitions];
- if (m_param->analysisRefineLevel > 4)
+ if (m_param->analysisReuseLevel > 4)
{
m_reusePartSize = &m_reuseInterDataCTU->partSize[ctu.m_cuAddr * ctu.m_numPartitions];
m_reuseMergeFlag = &m_reuseInterDataCTU->mergeFlag[ctu.m_cuAddr * ctu.m_numPartitions];
}
- if (m_param->analysisMode == X265_ANALYSIS_SAVE)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
for (int i = 0; i < X265_MAX_PRED_MODE_PER_CTU * numPredDir; i++)
m_reuseRef[i] = -1;
}
@@ -188,7 +224,7 @@ Mode& Analysis::compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, con
if (m_slice->m_sliceType == I_SLICE)
{
analysis_intra_data* intraDataCTU = (analysis_intra_data*)m_frame->m_analysisData.intraData;
- if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1)
{
memcpy(ctu.m_cuDepth, &intraDataCTU->depth[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition);
memcpy(ctu.m_lumaIntraDir, &intraDataCTU->modes[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition);
@@ -200,8 +236,8 @@ Mode& Analysis::compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, con
else
{
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
- ctu.m_cuPelX / g_maxCUSize >= frame.m_encData->m_pir.pirStartCol
- && ctu.m_cuPelX / g_maxCUSize < frame.m_encData->m_pir.pirEndCol)
+ ctu.m_cuPelX / m_param->maxCUSize >= frame.m_encData->m_pir.pirStartCol
+ && ctu.m_cuPelX / m_param->maxCUSize < frame.m_encData->m_pir.pirEndCol)
compressIntraCU(ctu, cuGeom, qp);
else if (!m_param->rdLevel)
{
@@ -214,7 +250,7 @@ Mode& Analysis::compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, con
/* generate residual for entire CTU at once and copy to reconPic */
encodeResidue(ctu, cuGeom);
}
- else if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel == 10)
+ else if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel == 10)
{
analysis_inter_data* interDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
int posCTU = ctu.m_cuAddr * numPartition;
@@ -229,7 +265,7 @@ Mode& Analysis::compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, con
}
//Calculate log2CUSize from depth
for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
- ctu.m_log2CUSize[i] = (uint8_t)g_maxLog2CUSize - ctu.m_cuDepth[i];
+ ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i];
qprdRefine (ctu, cuGeom, qp, qp);
return *m_modeDepth[0].bestMode;
@@ -245,9 +281,69 @@ Mode& Analysis::compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, con
if (m_param->bEnableRdRefine || m_param->bOptCUDeltaQP)
qprdRefine(ctu, cuGeom, qp, qp);
+ if (m_param->csvLogLevel >= 2)
+ collectPUStatistics(ctu, cuGeom);
+
return *m_modeDepth[0].bestMode;
}
+void Analysis::collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom)
+{
+ uint8_t depth = 0;
+ uint8_t partSize = 0;
+ for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
+ {
+ depth = ctu.m_cuDepth[absPartIdx];
+ partSize = ctu.m_partSize[absPartIdx];
+ uint32_t numPU = nbPartsTable[(int)partSize];
+ int shift = 2 * (m_param->maxCUDepth + 1 - depth);
+ for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
+ {
+ PredictionUnit pu(ctu, cuGeom, puIdx);
+ int puabsPartIdx = ctu.getPUOffset(puIdx, absPartIdx);
+ int mode = 1;
+ if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_Nx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxN)
+ mode = 2;
+ else if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnU || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnD || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nLx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nRx2N)
+ mode = 3;
+
+ if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_SKIP)
+ {
+ ctu.m_encData->m_frameStats.cntSkipPu[depth] += (uint64_t)(1 << shift);
+ ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
+ }
+ else if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_INTRA)
+ {
+ if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_NxN)
+ {
+ ctu.m_encData->m_frameStats.cnt4x4++;
+ ctu.m_encData->m_frameStats.totalPu[4]++;
+ }
+ else
+ {
+ ctu.m_encData->m_frameStats.cntIntraPu[depth] += (uint64_t)(1 << shift);
+ ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
+ }
+ }
+ else if (mode == 3)
+ {
+ ctu.m_encData->m_frameStats.cntAmp[depth] += (uint64_t)(1 << shift);
+ ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
+ break;
+ }
+ else
+ {
+ if (ctu.m_mergeFlag[puabsPartIdx + absPartIdx])
+ ctu.m_encData->m_frameStats.cntMergePu[depth][ctu.m_partSize[puabsPartIdx + absPartIdx]] += (1 << shift) / mode;
+ else
+ ctu.m_encData->m_frameStats.cntInterPu[depth][ctu.m_partSize[puabsPartIdx + absPartIdx]] += (1 << shift) / mode;
+
+ ctu.m_encData->m_frameStats.totalPu[depth] += (1 << shift) / mode;
+ }
+ }
+ }
+}
+
int32_t Analysis::loadTUDepth(CUGeom cuGeom, CUData parentCTU)
{
float predDepth = 0;
@@ -336,7 +432,7 @@ void Analysis::qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t
int lambdaQP = lqp;
bool doQPRefine = (bDecidedDepth && depth <= m_slice->m_pps->maxCuDQPDepth) || (!bDecidedDepth && depth == m_slice->m_pps->maxCuDQPDepth);
- if (m_param->analysisRefineLevel == 10)
+ if (m_param->analysisReuseLevel == 10)
doQPRefine = false;
if (doQPRefine)
@@ -400,6 +496,13 @@ void Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, in
bool bAlreadyDecided = parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] != (uint8_t)ALL_IDX;
bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
+ int split = 0;
+ if (m_param->intraRefine)
+ {
+ split = ((cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize] + 1)) && bDecidedDepth);
+ if (cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize]) && !bDecidedDepth)
+ bAlreadyDecided = false;
+ }
if (bAlreadyDecided)
{
@@ -408,8 +511,11 @@ void Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, in
Mode& mode = md.pred[0];
md.bestMode = &mode;
mode.cu.initSubCU(parentCTU, cuGeom, qp);
- memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
- memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
+ if (m_param->intraRefine != 2 || parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] <= 1)
+ {
+ memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
+ memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
+ }
checkIntra(mode, cuGeom, (PartSize)parentCTU.m_partSize[cuGeom.absPartIdx]);
if (m_bTryLossless)
@@ -440,7 +546,7 @@ void Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, in
}
// stop recursion if we reach the depth of previous analysis decision
- mightSplit &= !(bAlreadyDecided && bDecidedDepth);
+ mightSplit &= !(bAlreadyDecided && bDecidedDepth) || split;
if (mightSplit)
{
@@ -501,7 +607,7 @@ void Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, in
}
/* Save Intra CUs TU depth only when analysis mode is OFF */
- if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4 && !m_param->analysisMode)
+ if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4 && !m_param->analysisReuseMode)
{
CUData* ctu = md.bestMode->cu.m_encData->getPicCTU(parentCTU.m_cuAddr);
int8_t maxTUDepth = -1;
@@ -1017,11 +1123,21 @@ SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom&
bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);
uint32_t minDepth = topSkipMinDepth(parentCTU, cuGeom);
+ bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
bool skipModes = false; /* Skip any remaining mode analyses at current depth */
bool skipRecursion = false; /* Skip recursion */
bool splitIntra = true;
bool skipRectAmp = false;
bool chooseMerge = false;
+ bool bCtuInfoCheck = false;
+ int sameContentRef = 0;
+
+ if (m_evaluateInter == 1)
+ {
+ skipRectAmp = !!md.bestMode;
+ mightSplit &= false;
+ minDepth = depth;
+ }
if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4)
m_maxTUDepth = loadTUDepth(cuGeom, parentCTU);
@@ -1040,7 +1156,54 @@ SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom&
md.pred[PRED_2Nx2N].sa8dCost = 0;
}
- if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
+ if (m_param->bCTUInfo && depth <= parentCTU.m_cuDepth[cuGeom.absPartIdx])
+ {
+ if (bDecidedDepth && m_additionalCtuInfo[cuGeom.absPartIdx])
+ sameContentRef = findSameContentRefCount(parentCTU, cuGeom);
+ if (depth < parentCTU.m_cuDepth[cuGeom.absPartIdx])
+ {
+ mightNotSplit &= bDecidedDepth;
+ bCtuInfoCheck = skipRecursion = false;
+ skipModes = true;
+ }
+ else if (mightNotSplit && bDecidedDepth)
+ {
+ if (m_additionalCtuInfo[cuGeom.absPartIdx])
+ {
+ bCtuInfoCheck = skipRecursion = true;
+ md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
+ md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
+ checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
+ if (!sameContentRef)
+ {
+ if ((m_param->bCTUInfo & 2) && (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth))
+ {
+ qp -= int32_t(0.04 * qp);
+ setLambdaFromQP(parentCTU, qp);
+ }
+ if (m_param->bCTUInfo & 4)
+ skipModes = false;
+ }
+ if (sameContentRef || (!sameContentRef && !(m_param->bCTUInfo & 4)))
+ {
+ if (m_param->rdLevel)
+ skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
+ if ((m_param->bCTUInfo & 4) && sameContentRef)
+ skipModes = md.bestMode && true;
+ }
+ }
+ else
+ {
+ md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
+ md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
+ checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
+ if (m_param->rdLevel)
+ skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
+ }
+ mightSplit &= !bDecidedDepth;
+ }
+ }
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
{
if (mightNotSplit && depth == m_reuseDepth[cuGeom.absPartIdx])
{
@@ -1054,7 +1217,7 @@ SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom&
if (m_param->rdLevel)
skipModes = m_param->bEnableEarlySkip && md.bestMode;
}
- if (m_param->analysisRefineLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
+ if (m_param->analysisReuseLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
{
if (m_reuseModes[cuGeom.absPartIdx] != MODE_INTRA && m_reuseModes[cuGeom.absPartIdx] != 4)
{
@@ -1082,7 +1245,7 @@ SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom&
}
/* Step 1. Evaluate Merge/Skip candidates for likely early-outs, if skip mode was not set above */
- if (mightNotSplit && depth >= minDepth && !md.bestMode) /* TODO: Re-evaluate if analysis load/save still works */
+ if (mightNotSplit && depth >= minDepth && !md.bestMode && !bCtuInfoCheck) /* TODO: Re-evaluate if analysis load/save still works */
{
/* Compute Merge Cost */
md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
@@ -1092,7 +1255,7 @@ SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom&
skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0); // TODO: sa8d threshold per depth
}
- if (md.bestMode && m_param->bEnableRecursionSkip)
+ if (md.bestMode && m_param->bEnableRecursionSkip && !bCtuInfoCheck)
{
skipRecursion = md.bestMode->cu.isSkipped(0);
if (mightSplit && depth >= minDepth && !skipRecursion)
@@ -1107,6 +1270,8 @@ SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom&
/* Step 2. Evaluate each of the 4 split sub-blocks in series */
if (mightSplit && !skipRecursion)
{
+ if (bCtuInfoCheck && m_param->bCTUInfo & 2)
+ qp = int((1 / 0.96) * qp + 0.5);
Mode* splitPred = &md.pred[PRED_SPLIT];
splitPred->initCosts();
CUData* splitCU = &splitPred->cu;
@@ -1162,7 +1327,7 @@ SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom&
* 2 3 */
uint32_t allSplitRefs = splitData[0].splitRefs | splitData[1].splitRefs | splitData[2].splitRefs | splitData[3].splitRefs;
/* Step 3. Evaluate ME (2Nx2N, rect, amp) and intra modes at current depth */
- if (mightNotSplit && depth >= minDepth)
+ if (mightNotSplit && (depth >= minDepth || (m_param->bCTUInfo && !md.bestMode)))
{
if (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth && m_slice->m_pps->maxCuDQPDepth != 0)
setLambdaFromQP(parentCTU, qp);
@@ -1346,7 +1511,7 @@ SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom&
}
}
}
- bool bTryIntra = (m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE;
+ bool bTryIntra = (m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE && !((m_param->bCTUInfo & 4) && bCtuInfoCheck);
if (m_param->rdLevel >= 3)
{
/* Calculate RD cost of best inter option */
@@ -1584,10 +1749,19 @@ SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom&
bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);
+ bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
bool skipRecursion = false;
bool skipModes = false;
bool splitIntra = true;
bool skipRectAmp = false;
+ bool bCtuInfoCheck = false;
+ int sameContentRef = 0;
+
+ if (m_evaluateInter == 1)
+ {
+ skipRectAmp = !!md.bestMode;
+ mightSplit &= false;
+ }
// avoid uninitialize value in below reference
if (m_param->limitModes)
@@ -1607,7 +1781,58 @@ SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom&
splitData[3].initSplitCUData();
uint32_t allSplitRefs = splitData[0].splitRefs | splitData[1].splitRefs | splitData[2].splitRefs | splitData[3].splitRefs;
uint32_t refMasks[2];
- if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
+ if (m_param->bCTUInfo && depth <= parentCTU.m_cuDepth[cuGeom.absPartIdx])
+ {
+ if (bDecidedDepth && m_additionalCtuInfo[cuGeom.absPartIdx])
+ sameContentRef = findSameContentRefCount(parentCTU, cuGeom);
+ if (depth < parentCTU.m_cuDepth[cuGeom.absPartIdx])
+ {
+ mightNotSplit &= bDecidedDepth;
+ bCtuInfoCheck = skipRecursion = false;
+ skipModes = true;
+ }
+ else if (mightNotSplit && bDecidedDepth)
+ {
+ if (m_additionalCtuInfo[cuGeom.absPartIdx])
+ {
+ bCtuInfoCheck = skipRecursion = true;
+ refMasks[0] = allSplitRefs;
+ md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
+ checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
+ checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
+ if (!sameContentRef)
+ {
+ if ((m_param->bCTUInfo & 2) && (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth))
+ {
+ qp -= int32_t(0.04 * qp);
+ setLambdaFromQP(parentCTU, qp);
+ }
+ if (m_param->bCTUInfo & 4)
+ skipModes = false;
+ }
+ if (sameContentRef || (!sameContentRef && !(m_param->bCTUInfo & 4)))
+ {
+ if (m_param->rdLevel)
+ skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
+ if ((m_param->bCTUInfo & 4) && sameContentRef)
+ skipModes = md.bestMode && true;
+ }
+ }
+ else
+ {
+ md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
+ md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
+ checkMerge2Nx2N_rd5_6(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
+ skipModes = !!m_param->bEnableEarlySkip && md.bestMode;
+ refMasks[0] = allSplitRefs;
+ md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
+ checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
+ checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
+ }
+ mightSplit &= !bDecidedDepth;
+ }
+ }
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
{
if (mightNotSplit && depth == m_reuseDepth[cuGeom.absPartIdx])
{
@@ -1625,7 +1850,7 @@ SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom&
if (m_param->bEnableRecursionSkip && depth && m_modeDepth[depth - 1].bestMode)
skipRecursion = md.bestMode && !md.bestMode->cu.getQtRootCbf(0);
}
- if (m_param->analysisRefineLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
+ if (m_param->analysisReuseLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
skipRectAmp = true && !!md.bestMode;
}
}
@@ -1653,7 +1878,7 @@ SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom&
}
/* Step 1. Evaluate Merge/Skip candidates for likely early-outs */
- if (mightNotSplit && !md.bestMode)
+ if (mightNotSplit && !md.bestMode && !bCtuInfoCheck)
{
md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
@@ -1672,6 +1897,8 @@ SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom&
/* Step 2. Evaluate each of the 4 split sub-blocks in series */
if (mightSplit && !skipRecursion)
{
+ if (bCtuInfoCheck && m_param->bCTUInfo & 2)
+ qp = int((1 / 0.96) * qp + 0.5);
Mode* splitPred = &md.pred[PRED_SPLIT];
splitPred->initCosts();
CUData* splitCU = &splitPred->cu;
@@ -1908,7 +2135,7 @@ SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom&
}
}
- if ((m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE)
+ if ((m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && (cuGeom.log2CUSize != MAX_LOG2_CU_SIZE) && !((m_param->bCTUInfo & 4) && bCtuInfoCheck))
{
if (!m_param->limitReferences || splitIntra)
{
@@ -2008,10 +2235,14 @@ void Analysis::recodeCU(const CUData& parentCTU, const CUGeom& cuGeom, int32_t q
ModeDepth& md = m_modeDepth[depth];
md.bestMode = NULL;
+ m_evaluateInter = 0;
bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);
bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
+ int split = (m_param->interRefine && cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize] + 1)
+ && bDecidedDepth && parentCTU.m_predMode[cuGeom.absPartIdx] == MODE_SKIP);
+
if (bDecidedDepth)
{
setLambdaFromQP(parentCTU, qp, lqp);
@@ -2022,8 +2253,11 @@ void Analysis::recodeCU(const CUData& parentCTU, const CUGeom& cuGeom, int32_t q
PartSize size = (PartSize)parentCTU.m_partSize[cuGeom.absPartIdx];
if (parentCTU.isIntra(cuGeom.absPartIdx))
{
- memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
- memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
+ if (m_param->intraRefine != 2 || parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] <= 1)
+ {
+ memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
+ memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
+ }
checkIntra(mode, cuGeom, size);
}
else
@@ -2033,20 +2267,22 @@ void Analysis::recodeCU(const CUData& parentCTU, const CUGeom& cuGeom, int32_t q
for (uint32_t part = 0; part < numPU; part++)
{
PredictionUnit pu(mode.cu, cuGeom, part);
- if (m_param->analysisRefineLevel == 10)
+ if (m_param->analysisReuseLevel == 10)
{
analysis_inter_data* interDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
int cuIdx = (mode.cu.m_cuAddr * parentCTU.m_numPartitions) + cuGeom.absPartIdx;
mode.cu.m_mergeFlag[pu.puAbsPartIdx] = interDataCTU->mergeFlag[cuIdx + part];
mode.cu.setPUInterDir(interDataCTU->interDir[cuIdx + part], pu.puAbsPartIdx, part);
- for (int dir = 0; dir < m_slice->isInterB() + 1; dir++)
+ for (int list = 0; list < m_slice->isInterB() + 1; list++)
{
- mode.cu.setPUMv(dir, interDataCTU->mv[dir][cuIdx + part], pu.puAbsPartIdx, part);
- mode.cu.setPURefIdx(dir, interDataCTU->refIdx[dir][cuIdx + part], pu.puAbsPartIdx, part);
- mode.cu.m_mvpIdx[dir][pu.puAbsPartIdx] = interDataCTU->mvpIdx[dir][cuIdx + part];
+ mode.cu.setPUMv(list, interDataCTU->mv[list][cuIdx + part], pu.puAbsPartIdx, part);
+ mode.cu.setPURefIdx(list, interDataCTU->refIdx[list][cuIdx + part], pu.puAbsPartIdx, part);
+ mode.cu.m_mvpIdx[list][pu.puAbsPartIdx] = interDataCTU->mvpIdx[list][cuIdx + part];
}
if (!mode.cu.m_mergeFlag[pu.puAbsPartIdx])
{
+ if (m_param->mvRefine)
+ m_me.setSourcePU(*mode.fencYuv, pu.ctuAddr, pu.cuAbsPartIdx, pu.puAbsPartIdx, pu.width, pu.height, m_param->searchMethod, m_param->subpelRefine, false);
//AMVP
MV mvc[(MD_ABOVE_LEFT + 1) * 2 + 2];
mode.cu.getNeighbourMV(part, pu.puAbsPartIdx, mode.interNeighbours);
@@ -2057,14 +2293,31 @@ void Analysis::recodeCU(const CUData& parentCTU, const CUGeom& cuGeom, int32_t q
continue;
mode.cu.getPMV(mode.interNeighbours, list, ref, mode.amvpCand[list][ref], mvc);
MV mvp = mode.amvpCand[list][ref][mode.cu.m_mvpIdx[list][pu.puAbsPartIdx]];
+ if (m_param->mvRefine)
+ {
+ MV outmv;
+ searchMV(mode, pu, list, ref, outmv);
+ mode.cu.setPUMv(list, outmv, pu.puAbsPartIdx, part);
+ }
mode.cu.m_mvd[list][pu.puAbsPartIdx] = mode.cu.m_mv[list][pu.puAbsPartIdx] - mvp;
}
}
+ else if(m_param->scaleFactor)
+ {
+ MVField candMvField[MRG_MAX_NUM_CANDS][2]; // double length for mv of both lists
+ uint8_t candDir[MRG_MAX_NUM_CANDS];
+ mode.cu.getInterMergeCandidates(pu.puAbsPartIdx, part, candMvField, candDir);
+ uint8_t mvpIdx = mode.cu.m_mvpIdx[0][pu.puAbsPartIdx];
+ mode.cu.setPUInterDir(candDir[mvpIdx], pu.puAbsPartIdx, part);
+ mode.cu.setPUMv(0, candMvField[mvpIdx][0].mv, pu.puAbsPartIdx, part);
+ mode.cu.setPUMv(1, candMvField[mvpIdx][1].mv, pu.puAbsPartIdx, part);
+ mode.cu.setPURefIdx(0, (int8_t)candMvField[mvpIdx][0].refIdx, pu.puAbsPartIdx, part);
+ mode.cu.setPURefIdx(1, (int8_t)candMvField[mvpIdx][1].refIdx, pu.puAbsPartIdx, part);
+ }
}
motionCompensation(mode.cu, pu, mode.predYuv, true, (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400));
}
-
- if (parentCTU.isSkipped(cuGeom.absPartIdx))
+ if (!m_param->interRefine && parentCTU.isSkipped(cuGeom.absPartIdx))
encodeResAndCalcRdSkipCU(mode);
else
encodeResAndCalcRdInterCU(mode, cuGeom);
@@ -2083,11 +2336,18 @@ void Analysis::recodeCU(const CUData& parentCTU, const CUGeom& cuGeom, int32_t q
if (mightSplit && m_param->rdLevel < 5)
checkDQPForSplitPred(*md.bestMode, cuGeom);
+
+ if (m_param->interRefine && parentCTU.m_predMode[cuGeom.absPartIdx] == MODE_SKIP && !mode.cu.isSkipped(0))
+ {
+ m_evaluateInter = 1;
+ m_param->rdLevel > 4 ? compressInterCU_rd5_6(parentCTU, cuGeom, qp) : compressInterCU_rd0_4(parentCTU, cuGeom, qp);
+ }
}
- else
+ if (!bDecidedDepth || split)
{
Mode* splitPred = &md.pred[PRED_SPLIT];
- md.bestMode = splitPred;
+ if (!split)
+ md.bestMode = splitPred;
splitPred->initCosts();
CUData* splitCU = &splitPred->cu;
splitCU->initSubCU(parentCTU, cuGeom, qp);
@@ -2109,8 +2369,12 @@ void Analysis::recodeCU(const CUData& parentCTU, const CUGeom& cuGeom, int32_t q
if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)
nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU, childGeom));
- int lamdaQP = m_param->analysisRefineLevel == 10 ? nextQP : lqp;
- qprdRefine(parentCTU, childGeom, nextQP, lamdaQP);
+ int lamdaQP = m_param->analysisReuseLevel == 10 ? nextQP : lqp;
+
+ if (split)
+ m_param->rdLevel > 4 ? compressInterCU_rd5_6(parentCTU, childGeom, nextQP) : compressInterCU_rd0_4(parentCTU, childGeom, nextQP);
+ else
+ qprdRefine(parentCTU, childGeom, nextQP, lamdaQP);
// Save best CU and pred data for this sub CU
splitCU->copyPartFrom(nd.bestMode->cu, childGeom, subPartIdx);
@@ -2131,6 +2395,14 @@ void Analysis::recodeCU(const CUData& parentCTU, const CUGeom& cuGeom, int32_t q
else
updateModeCost(*splitPred);
+ if (m_param->interRefine)
+ {
+ if (m_param->rdLevel > 1)
+ checkBestMode(*splitPred, cuGeom.depth);
+ else if (splitPred->sa8dCost < md.bestMode->sa8dCost)
+ md.bestMode = splitPred;
+ }
+
checkDQPForSplitPred(*splitPred, cuGeom);
/* Copy best data to encData CTU and recon */
@@ -2174,7 +2446,7 @@ void Analysis::checkMerge2Nx2N_rd0_4(Mode& skip, Mode& merge, const CUGeom& cuGe
int safeX, maxSafeMv;
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE)
{
- safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
+ safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
maxSafeMv = (safeX - tempPred->cu.m_cuPelX) * 4;
}
for (uint32_t i = 0; i < numMergeCand; ++i)
@@ -2200,7 +2472,7 @@ void Analysis::checkMerge2Nx2N_rd0_4(Mode& skip, Mode& merge, const CUGeom& cuGe
}
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
- tempPred->cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
+ tempPred->cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
candMvField[i][0].mv.x > maxSafeMv)
// skip merge candidates which reference beyond safe reference area
continue;
@@ -2304,7 +2576,7 @@ void Analysis::checkMerge2Nx2N_rd5_6(Mode& skip, Mode& merge, const CUGeom& cuGe
int safeX, maxSafeMv;
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE)
{
- safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
+ safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
maxSafeMv = (safeX - tempPred->cu.m_cuPelX) * 4;
}
for (uint32_t i = 0; i < numMergeCand; i++)
@@ -2345,7 +2617,7 @@ void Analysis::checkMerge2Nx2N_rd5_6(Mode& skip, Mode& merge, const CUGeom& cuGe
triedBZero = true;
}
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
- tempPred->cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
+ tempPred->cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
candMvField[i][0].mv.x > maxSafeMv)
// skip merge candidates which reference beyond safe reference area
continue;
@@ -2420,7 +2692,7 @@ void Analysis::checkInter_rd0_4(Mode& interMode, const CUGeom& cuGeom, PartSize
interMode.cu.setPredModeSubParts(MODE_INTER);
int numPredDir = m_slice->isInterP() ? 1 : 2;
- if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
{
int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
int index = 0;
@@ -2462,7 +2734,7 @@ void Analysis::checkInter_rd0_4(Mode& interMode, const CUGeom& cuGeom, PartSize
}
interMode.sa8dCost = m_rdCost.calcRdSADCost((uint32_t)interMode.distortion, interMode.sa8dBits);
- if (m_param->analysisMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1)
{
int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
int index = 0;
@@ -2484,7 +2756,7 @@ void Analysis::checkInter_rd5_6(Mode& interMode, const CUGeom& cuGeom, PartSize
interMode.cu.setPredModeSubParts(MODE_INTER);
int numPredDir = m_slice->isInterP() ? 1 : 2;
- if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
{
int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
int index = 0;
@@ -2518,7 +2790,7 @@ void Analysis::checkInter_rd5_6(Mode& interMode, const CUGeom& cuGeom, PartSize
/* predInterSearch sets interMode.sa8dBits, but this is ignored */
encodeResAndCalcRdInterCU(interMode, cuGeom);
- if (m_param->analysisMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1)
{
int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
int index = 0;
@@ -2671,7 +2943,7 @@ void Analysis::checkBidir2Nx2N(Mode& inter2Nx2N, Mode& bidir2Nx2N, const CUGeom&
void Analysis::encodeResidue(const CUData& ctu, const CUGeom& cuGeom)
{
- if (cuGeom.depth < ctu.m_cuDepth[cuGeom.absPartIdx] && cuGeom.depth < g_maxCUDepth)
+ if (cuGeom.depth < ctu.m_cuDepth[cuGeom.absPartIdx] && cuGeom.depth < ctu.m_encData->m_param->maxCUDepth)
{
for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
{
@@ -2970,7 +3242,7 @@ int Analysis::calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom, int3
uint32_t block_x = ctu.m_cuPelX + g_zscanToPelX[cuGeom.absPartIdx];
uint32_t block_y = ctu.m_cuPelY + g_zscanToPelY[cuGeom.absPartIdx];
uint32_t maxCols = (m_frame->m_fencPic->m_picWidth + (loopIncr - 1)) / loopIncr;
- uint32_t blockSize = g_maxCUSize >> cuGeom.depth;
+ uint32_t blockSize = m_param->maxCUSize >> cuGeom.depth;
double qp_offset = 0;
uint32_t cnt = 0;
uint32_t idx;
@@ -3064,3 +3336,22 @@ void Analysis::calculateNormFactor(CUData& ctu, int qp)
normFactor(srcV, blockSizeC, ctu, qp, TEXT_CHROMA_V);
}
}
+
+int Analysis::findSameContentRefCount(const CUData& parentCTU, const CUGeom& cuGeom)
+{
+ int sameContentRef = 0;
+ int m_curPoc = parentCTU.m_slice->m_poc;
+ int prevChange = m_prevCtuInfoChange[cuGeom.absPartIdx];
+ int numPredDir = m_slice->isInterP() ? 1 : 2;
+ for (int list = 0; list < numPredDir; list++)
+ {
+ for (int i = 0; i < m_frame->m_encData->m_slice->m_numRefIdx[list]; i++)
+ {
+ int refPoc = m_frame->m_encData->m_slice->m_refFrameList[list][i]->m_poc;
+ int refPrevChange = m_frame->m_encData->m_slice->m_refFrameList[list][i]->m_addOnPrevChange[parentCTU.m_cuAddr][cuGeom.absPartIdx];
+ if ((refPoc < prevChange && refPoc < m_curPoc) || (refPoc > m_curPoc && prevChange < m_curPoc && refPrevChange > m_curPoc) || ((refPoc == prevChange) && (m_additionalCtuInfo[cuGeom.absPartIdx] == CTU_INFO_CHANGE)))
+ sameContentRef++; /* Content changed */
+ }
+ }
+ return sameContentRef;
+}
diff --git a/source/encoder/analysis.h b/source/encoder/analysis.h
index 44f38f1..077db0c 100644
--- a/source/encoder/analysis.h
+++ b/source/encoder/analysis.h
@@ -137,6 +137,10 @@ protected:
int* m_multipassMvpIdx[2];
int32_t* m_multipassRef[2];
uint8_t* m_multipassModes;
+
+ uint8_t m_evaluateInter;
+ uint8_t* m_additionalCtuInfo;
+ int* m_prevCtuInfoChange;
/* refine RD based on QP for rd-levels 5 and 6 */
void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp);
@@ -178,6 +182,9 @@ protected:
void calculateNormFactor(CUData& ctu, int qp);
void normFactor(const pixel* src, uint32_t blockSize, CUData& ctu, int qp, TextType ttype);
+
+ void collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom);
+
/* check whether current mode is the new best */
inline void checkBestMode(Mode& mode, uint32_t depth)
{
@@ -190,6 +197,7 @@ protected:
else
md.bestMode = &mode;
}
+ int findSameContentRefCount(const CUData& parentCTU, const CUGeom& cuGeom);
};
struct ThreadLocalData
diff --git a/source/encoder/api.cpp b/source/encoder/api.cpp
index d38ba81..85fb893 100644
--- a/source/encoder/api.cpp
+++ b/source/encoder/api.cpp
@@ -30,6 +30,7 @@
#include "level.h"
#include "nal.h"
#include "bitcost.h"
+#include "x265-extras.h"
/* multilib namespace reflectors */
#if LINKED_8BIT
@@ -96,9 +97,6 @@ x265_encoder *x265_encoder_open(x265_param *p)
if (x265_check_params(param))
goto fail;
- if (x265_set_globals(param))
- goto fail;
-
encoder = new Encoder;
if (!param->rc.bEnableSlowFirstPass)
PARAM_NS::x265_param_apply_fastfirstpass(param);
@@ -119,6 +117,17 @@ x265_encoder *x265_encoder_open(x265_param *p)
}
encoder->create();
+ /* Try to open CSV file handle */
+ if (encoder->m_param->csvfn)
+ {
+ encoder->m_param->csvfpt = x265_csvlog_open(*encoder->m_param, encoder->m_param->csvfn, encoder->m_param->csvLogLevel);
+ if (!encoder->m_param->csvfpt)
+ {
+ x265_log(encoder->m_param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", encoder->m_param->csvfn);
+ encoder->m_aborted = true;
+ }
+ }
+
encoder->m_latestParam = latestParam;
memcpy(latestParam, param, sizeof(x265_param));
if (encoder->m_aborted)
@@ -144,7 +153,10 @@ int x265_encoder_headers(x265_encoder *enc, x265_nal **pp_nal, uint32_t *pi_nal)
if (encoder->m_param->rc.bStatRead && encoder->m_param->bMultiPassOptRPS)
{
if (!encoder->computeSPSRPSIndex())
+ {
+ encoder->m_aborted = true;
return -1;
+ }
}
encoder->getStreamHeaders(encoder->m_nalList, sbacCoder, bs);
*pp_nal = &encoder->m_nalList.m_nal[0];
@@ -152,6 +164,11 @@ int x265_encoder_headers(x265_encoder *enc, x265_nal **pp_nal, uint32_t *pi_nal)
return encoder->m_nalList.m_occupancy;
}
+ if (enc)
+ {
+ Encoder *encoder = static_cast<Encoder*>(enc);
+ encoder->m_aborted = true;
+ }
return -1;
}
@@ -251,6 +268,12 @@ int x265_encoder_encode(x265_encoder *enc, x265_nal **pp_nal, uint32_t *pi_nal,
else if (pi_nal)
*pi_nal = 0;
+ if (numEncoded && encoder->m_param->csvLogLevel)
+ x265_csvlog_frame(encoder->m_param->csvfpt, *encoder->m_param, *pic_out, encoder->m_param->csvLogLevel);
+
+ if (numEncoded < 0)
+ encoder->m_aborted = true;
+
return numEncoded;
}
@@ -263,12 +286,17 @@ void x265_encoder_get_stats(x265_encoder *enc, x265_stats *outputStats, uint32_t
}
}
-void x265_encoder_log(x265_encoder* enc, int, char **)
+void x265_encoder_log(x265_encoder* enc, int argc, char **argv)
{
if (enc)
{
Encoder *encoder = static_cast<Encoder*>(enc);
- x265_log(encoder->m_param, X265_LOG_WARNING, "x265_encoder_log is now deprecated\n");
+ x265_stats stats;
+ int padx = encoder->m_sps.conformanceWindow.rightOffset;
+ int pady = encoder->m_sps.conformanceWindow.bottomOffset;
+ encoder->fetchStats(&stats, sizeof(stats));
+ const x265_api * api = x265_api_get(0);
+ x265_csvlog_encode(encoder->m_param->csvfpt, api->version_str, *encoder->m_param, padx, pady, stats, encoder->m_param->csvLogLevel, argc, argv);
}
}
@@ -282,7 +310,6 @@ void x265_encoder_close(x265_encoder *enc)
encoder->printSummary();
encoder->destroy();
delete encoder;
- ATOMIC_DEC(&g_ctuSizeConfigured);
}
}
@@ -295,14 +322,18 @@ int x265_encoder_intra_refresh(x265_encoder *enc)
encoder->m_bQueuedIntraRefresh = 1;
return 0;
}
+int x265_encoder_ctu_info(x265_encoder *enc, int poc, x265_ctu_info_t** ctu)
+{
+ if (!ctu || !enc)
+ return -1;
+ Encoder* encoder = static_cast<Encoder*>(enc);
+ encoder->copyCtuInfo(ctu, poc);
+ return 0;
+}
void x265_cleanup(void)
{
- if (!g_ctuSizeConfigured)
- {
- BitCost::destroy();
- CUData::s_partSet[0] = NULL; /* allow CUData to adjust to new CTU size */
- }
+ BitCost::destroy();
}
x265_picture *x265_picture_alloc()
@@ -321,14 +352,14 @@ void x265_picture_init(x265_param *param, x265_picture *pic)
pic->userSEI.payloads = NULL;
pic->userSEI.numPayloads = 0;
- if (param->analysisMode)
+ if (param->analysisReuseMode)
{
- uint32_t widthInCU = (param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
- uint32_t heightInCU = (param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+ uint32_t widthInCU = (param->sourceWidth + param->maxCUSize - 1) >> param->maxLog2CUSize;
+ uint32_t heightInCU = (param->sourceHeight + param->maxCUSize - 1) >> param->maxLog2CUSize;
uint32_t numCUsInFrame = widthInCU * heightInCU;
pic->analysisData.numCUsInFrame = numCUsInFrame;
- pic->analysisData.numPartitions = NUM_4x4_PARTITIONS;
+ pic->analysisData.numPartitions = param->num4x4Partitions;
}
}
@@ -372,6 +403,7 @@ static const x265_api libapi =
sizeof(x265_frame_stats),
&x265_encoder_intra_refresh,
+ &x265_encoder_ctu_info,
};
typedef const x265_api* (*api_get_func)(int bitDepth);
diff --git a/source/encoder/dpb.cpp b/source/encoder/dpb.cpp
index 3a8fef5..c225cf3 100644
--- a/source/encoder/dpb.cpp
+++ b/source/encoder/dpb.cpp
@@ -105,6 +105,23 @@ void DPB::recycleUnreferenced()
}
}
+ if (curFrame->m_ctuInfo != NULL)
+ {
+ uint32_t widthInCU = (curFrame->m_param->sourceWidth + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize;
+ uint32_t heightInCU = (curFrame->m_param->sourceHeight + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize;
+ uint32_t numCUsInFrame = widthInCU * heightInCU;
+ for (uint32_t i = 0; i < numCUsInFrame; i++)
+ {
+ X265_FREE((*curFrame->m_ctuInfo + i)->ctuInfo);
+ (*curFrame->m_ctuInfo + i)->ctuInfo = NULL;
+ }
+ X265_FREE(*curFrame->m_ctuInfo);
+ *(curFrame->m_ctuInfo) = NULL;
+ X265_FREE(curFrame->m_ctuInfo);
+ curFrame->m_ctuInfo = NULL;
+ X265_FREE(curFrame->m_prevCtuInfoChange);
+ curFrame->m_prevCtuInfoChange = NULL;
+ }
curFrame->m_encData = NULL;
curFrame->m_reconPic = NULL;
}
@@ -187,7 +204,7 @@ void DPB::prepareEncode(Frame *newFrame)
}
// Disable Loopfilter in bound area, because we will do slice-parallelism in future
- slice->m_sLFaseFlag = (g_maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0);
+ slice->m_sLFaseFlag = (newFrame->m_param->maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0);
/* Increment reference count of all motion-referenced frames to prevent them
* from being recycled. These counts are decremented at the end of
diff --git a/source/encoder/encoder.cpp b/source/encoder/encoder.cpp
index 9aea032..0709d0d 100644
--- a/source/encoder/encoder.cpp
+++ b/source/encoder/encoder.cpp
@@ -86,8 +86,10 @@ Encoder::Encoder()
m_frameEncoder[i] = NULL;
MotionEstimate::initScales();
-#if ENABLE_DYNAMIC_HDR10
+#if ENABLE_HDR10_PLUS
m_hdr10plus_api = hdr10plus_api_get();
+ numCimInfo = 0;
+ cim = NULL;
#endif
m_prevTonemapPayload.payload = NULL;
@@ -132,26 +134,19 @@ void Encoder::create()
if (!p->bEnableWavefront && !p->bDistributeModeAnalysis && !p->bDistributeMotionEstimation && !p->lookaheadSlices)
allowPools = false;
- if (!p->frameNumThreads)
- {
- // auto-detect frame threads
- int cpuCount = ThreadPool::getCpuCount();
- if (!p->bEnableWavefront)
- p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS);
- else if (cpuCount >= 32)
- p->frameNumThreads = (p->sourceHeight > 2000) ? 8 : 6; // dual-socket 10-core IvyBridge or higher
- else if (cpuCount >= 16)
- p->frameNumThreads = 5; // 8 HT cores, or dual socket
- else if (cpuCount >= 8)
- p->frameNumThreads = 3; // 4 HT cores
- else if (cpuCount >= 4)
- p->frameNumThreads = 2; // Dual or Quad core
- else
- p->frameNumThreads = 1;
- }
m_numPools = 0;
if (allowPools)
m_threadPool = ThreadPool::allocThreadPools(p, m_numPools, 0);
+ else
+ {
+ if (!p->frameNumThreads)
+ {
+ // auto-detect frame threads
+ int cpuCount = ThreadPool::getCpuCount();
+ ThreadPool::getFrameThreadsCount(p, cpuCount);
+ }
+ }
+
if (!m_numPools)
{
// issue warnings if any of these features were requested
@@ -320,8 +315,8 @@ void Encoder::create()
else
m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
- int numRows = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
- int numCols = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
+ int numRows = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
+ int numCols = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
for (int i = 0; i < m_param->frameNumThreads; i++)
{
if (!m_frameEncoder[i]->init(this, numRows, numCols))
@@ -346,12 +341,12 @@ void Encoder::create()
initRefIdx();
- if (m_param->analysisMode)
+ if (m_param->analysisReuseMode)
{
- const char* name = m_param->analysisFileName;
+ const char* name = m_param->analysisReuseFileName;
if (!name)
name = defaultAnalysisFileName;
- const char* mode = m_param->analysisMode == X265_ANALYSIS_LOAD ? "rb" : "wb";
+ const char* mode = m_param->analysisReuseMode == X265_ANALYSIS_LOAD ? "rb" : "wb";
m_analysisFile = x265_fopen(name, mode);
if (!m_analysisFile)
{
@@ -362,7 +357,7 @@ void Encoder::create()
if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
{
- const char* name = m_param->analysisFileName;
+ const char* name = m_param->analysisReuseFileName;
if (!name)
name = defaultAnalysisFileName;
if (m_param->rc.bStatWrite)
@@ -431,6 +426,10 @@ void Encoder::stopJobs()
void Encoder::destroy()
{
+#if ENABLE_HDR10_PLUS
+ m_hdr10plus_api->hdr10plus_clear_movie(cim, numCimInfo);
+#endif
+
if (m_exportedPic)
{
ATOMIC_DEC(&m_exportedPic->m_countRefEncoders);
@@ -482,7 +481,7 @@ void Encoder::destroy()
{
int bError = 1;
fclose(m_analysisFileOut);
- const char* name = m_param->analysisFileName;
+ const char* name = m_param->analysisReuseFileName;
if (!name)
name = defaultAnalysisFileName;
char* temp = strcatFilename(name, ".temp");
@@ -499,11 +498,14 @@ void Encoder::destroy()
}
if (m_param)
{
+ if (m_param->csvfpt)
+ fclose(m_param->csvfpt);
/* release string arguments that were strdup'd */
free((char*)m_param->rc.lambdaFileName);
free((char*)m_param->rc.statFileName);
- free((char*)m_param->analysisFileName);
+ free((char*)m_param->analysisReuseFileName);
free((char*)m_param->scalingLists);
+ free((char*)m_param->csvfn);
free((char*)m_param->numaPools);
free((char*)m_param->masteringDisplayColorVolume);
free((char*)m_param->toneMapFile);
@@ -518,7 +520,7 @@ void Encoder::updateVbvPlan(RateControl* rc)
FrameEncoder *encoder = m_frameEncoder[i];
if (encoder->m_rce.isActive && encoder->m_rce.poc != rc->m_curSlice->m_poc)
{
- int64_t bits = (int64_t) X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned);
+ int64_t bits = m_param->rc.bEnableConstVbv ? (int64_t)encoder->m_rce.frameSizePlanned : (int64_t)X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned);
rc->m_bufferFill -= bits;
rc->m_bufferFill = X265_MAX(rc->m_bufferFill, 0);
rc->m_bufferFill += encoder->m_rce.bufferRate;
@@ -593,6 +595,8 @@ int Encoder::encode(const x265_picture* pic_in, x265_picture* pic_out)
if (m_exportedPic)
{
+ if (!m_param->bUseAnalysisFile && m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
+ freeAnalysis(&m_exportedPic->m_analysisData);
ATOMIC_DEC(&m_exportedPic->m_countRefEncoders);
m_exportedPic = NULL;
m_dpb->recycleUnreferenced();
@@ -601,16 +605,22 @@ int Encoder::encode(const x265_picture* pic_in, x265_picture* pic_out)
{
x265_sei_payload toneMap;
toneMap.payload = NULL;
-#if ENABLE_DYNAMIC_HDR10
+#if ENABLE_HDR10_PLUS
if (m_bToneMap)
{
- uint8_t *cim = NULL;
- if (m_hdr10plus_api->hdr10plus_json_to_frame_cim(m_param->toneMapFile, pic_in->poc, cim))
+ if (pic_in->poc == 0)
+ numCimInfo = m_hdr10plus_api->hdr10plus_json_to_movie_cim(m_param->toneMapFile, cim);
+ if (pic_in->poc < numCimInfo)
{
- toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * cim[0]);
- toneMap.payloadSize = cim[0];
+ int32_t i = 0;
+ toneMap.payloadSize = 0;
+ while (cim[pic_in->poc][i] == 0xFF)
+ toneMap.payloadSize += cim[pic_in->poc][i++];
+ toneMap.payloadSize += cim[pic_in->poc][i++];
+
+ toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * toneMap.payloadSize);
toneMap.payloadType = USER_DATA_REGISTERED_ITU_T_T35;
- memcpy(toneMap.payload, cim, toneMap.payloadSize);
+ memcpy(toneMap.payload, cim[pic_in->poc] + i, toneMap.payloadSize);
}
}
#endif
@@ -708,7 +718,7 @@ int Encoder::encode(const x265_picture* pic_in, x265_picture* pic_out)
for (int i = 0; i < numPayloads; i++)
{
x265_sei_payload input;
- if (i == (numPayloads - 1))
+ if ((i == (numPayloads - 1)) && toneMapEnable)
input = toneMap;
else
input = pic_in->userSEI.payloads[i];
@@ -754,24 +764,40 @@ int Encoder::encode(const x265_picture* pic_in, x265_picture* pic_out)
/* In analysisSave mode, x265_analysis_data is allocated in pic_in and inFrame points to this */
/* Load analysis data before lookahead->addPicture, since sliceType has been decided */
- if (m_param->analysisMode == X265_ANALYSIS_LOAD)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
{
- x265_picture* inputPic = const_cast<x265_picture*>(pic_in);
/* readAnalysisFile reads analysis data for the frame and allocates memory based on slicetype */
- readAnalysisFile(&inputPic->analysisData, inFrame->m_poc);
- inFrame->m_analysisData.poc = inFrame->m_poc;
- inFrame->m_analysisData.sliceType = inputPic->analysisData.sliceType;
- inFrame->m_analysisData.bScenecut = inputPic->analysisData.bScenecut;
- inFrame->m_analysisData.satdCost = inputPic->analysisData.satdCost;
- inFrame->m_analysisData.numCUsInFrame = inputPic->analysisData.numCUsInFrame;
- inFrame->m_analysisData.numPartitions = inputPic->analysisData.numPartitions;
- inFrame->m_analysisData.wt = inputPic->analysisData.wt;
- inFrame->m_analysisData.interData = inputPic->analysisData.interData;
- inFrame->m_analysisData.intraData = inputPic->analysisData.intraData;
- sliceType = inputPic->analysisData.sliceType;
+ readAnalysisFile(&inFrame->m_analysisData, inFrame->m_poc, pic_in);
+ sliceType = inFrame->m_analysisData.sliceType;
inFrame->m_lowres.bScenecut = !!inFrame->m_analysisData.bScenecut;
inFrame->m_lowres.satdCost = inFrame->m_analysisData.satdCost;
}
+ if (m_param->bUseRcStats && pic_in->rcData)
+ {
+ RcStats* rc = (RcStats*)pic_in->rcData;
+ m_rateControl->m_accumPQp = rc->cumulativePQp;
+ m_rateControl->m_accumPNorm = rc->cumulativePNorm;
+ m_rateControl->m_isNextGop = true;
+ for (int j = 0; j < 3; j++)
+ m_rateControl->m_lastQScaleFor[j] = rc->lastQScaleFor[j];
+ m_rateControl->m_wantedBitsWindow = rc->wantedBitsWindow;
+ m_rateControl->m_cplxrSum = rc->cplxrSum;
+ m_rateControl->m_totalBits = rc->totalBits;
+ m_rateControl->m_encodedBits = rc->encodedBits;
+ m_rateControl->m_shortTermCplxSum = rc->shortTermCplxSum;
+ m_rateControl->m_shortTermCplxCount = rc->shortTermCplxCount;
+ if (m_rateControl->m_isVbv)
+ {
+ m_rateControl->m_bufferFillFinal = rc->bufferFillFinal;
+ for (int i = 0; i < 4; i++)
+ {
+ m_rateControl->m_pred[i].coeff = rc->coeff[i];
+ m_rateControl->m_pred[i].count = rc->count[i];
+ m_rateControl->m_pred[i].offset = rc->offset[i];
+ }
+ }
+ m_param->bUseRcStats = 0;
+ }
if (m_reconfigureRc)
inFrame->m_reconfigureRc = true;
@@ -805,7 +831,7 @@ int Encoder::encode(const x265_picture* pic_in, x265_picture* pic_out)
x265_frame_stats* frameData = NULL;
/* Free up pic_in->analysisData since it has already been used */
- if (m_param->analysisMode == X265_ANALYSIS_LOAD)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
freeAnalysis(&outFrame->m_analysisData);
if (pic_out)
@@ -819,20 +845,7 @@ int Encoder::encode(const x265_picture* pic_in, x265_picture* pic_out)
pic_out->pts = outFrame->m_pts;
pic_out->dts = outFrame->m_dts;
-
- switch (slice->m_sliceType)
- {
- case I_SLICE:
- pic_out->sliceType = outFrame->m_lowres.bKeyframe ? X265_TYPE_IDR : X265_TYPE_I;
- break;
- case P_SLICE:
- pic_out->sliceType = X265_TYPE_P;
- break;
- case B_SLICE:
- pic_out->sliceType = X265_TYPE_B;
- break;
- }
-
+ pic_out->sliceType = outFrame->m_lowres.sliceType;
pic_out->planes[0] = recpic->m_picOrg[0];
pic_out->stride[0] = (int)(recpic->m_stride * sizeof(pixel));
if (m_param->internalCsp != X265_CSP_I400)
@@ -844,7 +857,7 @@ int Encoder::encode(const x265_picture* pic_in, x265_picture* pic_out)
}
/* Dump analysis data from pic_out to file in save mode and free */
- if (m_param->analysisMode == X265_ANALYSIS_SAVE)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
{
pic_out->analysisData.poc = pic_out->poc;
pic_out->analysisData.sliceType = pic_out->sliceType;
@@ -856,7 +869,8 @@ int Encoder::encode(const x265_picture* pic_in, x265_picture* pic_out)
pic_out->analysisData.interData = outFrame->m_analysisData.interData;
pic_out->analysisData.intraData = outFrame->m_analysisData.intraData;
writeAnalysisFile(&pic_out->analysisData, *outFrame->m_encData);
- freeAnalysis(&pic_out->analysisData);
+ if (m_param->bUseAnalysisFile)
+ freeAnalysis(&pic_out->analysisData);
}
}
if (m_param->rc.bStatWrite && (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion))
@@ -1012,16 +1026,17 @@ int Encoder::encode(const x265_picture* pic_in, x265_picture* pic_out)
Slice* slice = frameEnc->m_encData->m_slice;
slice->m_sps = &m_sps;
slice->m_pps = &m_pps;
+ slice->m_param = m_param;
slice->m_maxNumMergeCand = m_param->maxNumMergeCand;
- slice->m_endCUAddr = slice->realEndAddress(m_sps.numCUsInFrame * NUM_4x4_PARTITIONS);
+ slice->m_endCUAddr = slice->realEndAddress(m_sps.numCUsInFrame * m_param->num4x4Partitions);
}
if (m_param->searchMethod == X265_SEA && frameEnc->m_lowres.sliceType != X265_TYPE_B)
{
- int padX = g_maxCUSize + 32;
- int padY = g_maxCUSize + 16;
- uint32_t numCuInHeight = (frameEnc->m_encData->m_reconPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
- int maxHeight = numCuInHeight * g_maxCUSize;
+ int padX = m_param->maxCUSize + 32;
+ int padY = m_param->maxCUSize + 16;
+ uint32_t numCuInHeight = (frameEnc->m_encData->m_reconPic->m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
+ int maxHeight = numCuInHeight * m_param->maxCUSize;
for (int i = 0; i < INTEGRAL_PLANE_NUM; i++)
{
frameEnc->m_encData->m_meBuffer[i] = X265_MALLOC(uint32_t, frameEnc->m_reconPic->m_stride * (maxHeight + (2 * padY)));
@@ -1080,17 +1095,17 @@ int Encoder::encode(const x265_picture* pic_in, x265_picture* pic_out)
frameEnc->m_dts = frameEnc->m_reorderedPts;
/* Allocate analysis data before encode in save mode. This is allocated in frameEnc */
- if (m_param->analysisMode == X265_ANALYSIS_SAVE)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
{
x265_analysis_data* analysis = &frameEnc->m_analysisData;
analysis->poc = frameEnc->m_poc;
analysis->sliceType = frameEnc->m_lowres.sliceType;
- uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
- uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
uint32_t numCUsInFrame = widthInCU * heightInCU;
analysis->numCUsInFrame = numCUsInFrame;
- analysis->numPartitions = NUM_4x4_PARTITIONS;
+ analysis->numPartitions = m_param->num4x4Partitions;
allocAnalysis(analysis);
}
/* determine references, setup RPS, etc */
@@ -1157,6 +1172,120 @@ int Encoder::reconfigureParam(x265_param* encParam, x265_param* param)
return x265_check_params(encParam);
}
+void Encoder::copyCtuInfo(x265_ctu_info_t** frameCtuInfo, int poc)
+{
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+ Frame* curFrame;
+ Frame* prevFrame = NULL;
+ int32_t* frameCTU;
+ uint32_t numCUsInFrame = widthInCU * heightInCU;
+ uint32_t maxNum8x8Partitions = 64;
+ bool copied = false;
+ do
+ {
+ curFrame = m_lookahead->m_inputQueue.getPOC(poc);
+ if (!curFrame)
+ curFrame = m_lookahead->m_outputQueue.getPOC(poc);
+
+ if (poc > 0)
+ {
+ prevFrame = m_lookahead->m_inputQueue.getPOC(poc - 1);
+ if (!prevFrame)
+ prevFrame = m_lookahead->m_outputQueue.getPOC(poc - 1);
+ if (!prevFrame)
+ {
+ FrameEncoder* prevEncoder;
+ for (int i = 0; i < m_param->frameNumThreads; i++)
+ {
+ prevEncoder = m_frameEncoder[i];
+ prevFrame = prevEncoder->m_frame;
+ if (prevFrame && (prevEncoder->m_frame->m_poc == poc - 1))
+ {
+ prevFrame = prevEncoder->m_frame;
+ break;
+ }
+ }
+ }
+ }
+ x265_ctu_info_t* ctuTemp, *prevCtuTemp;
+ if (curFrame)
+ {
+ if (!curFrame->m_ctuInfo)
+ CHECKED_MALLOC(curFrame->m_ctuInfo, x265_ctu_info_t*, 1);
+ CHECKED_MALLOC(*curFrame->m_ctuInfo, x265_ctu_info_t, numCUsInFrame);
+ CHECKED_MALLOC_ZERO(curFrame->m_prevCtuInfoChange, int, numCUsInFrame * maxNum8x8Partitions);
+ for (uint32_t i = 0; i < numCUsInFrame; i++)
+ {
+ ctuTemp = *curFrame->m_ctuInfo + i;
+ CHECKED_MALLOC(frameCTU, int32_t, maxNum8x8Partitions);
+ ctuTemp->ctuInfo = (int32_t*)frameCTU;
+ ctuTemp->ctuAddress = frameCtuInfo[i]->ctuAddress;
+ memcpy(ctuTemp->ctuPartitions, frameCtuInfo[i]->ctuPartitions, sizeof(int32_t) * maxNum8x8Partitions);
+ memcpy(ctuTemp->ctuInfo, frameCtuInfo[i]->ctuInfo, sizeof(int32_t) * maxNum8x8Partitions);
+ if (prevFrame && curFrame->m_poc > 1)
+ {
+ prevCtuTemp = *prevFrame->m_ctuInfo + i;
+ for (uint32_t j = 0; j < maxNum8x8Partitions; j++)
+ curFrame->m_prevCtuInfoChange[i * maxNum8x8Partitions + j] = (*((int32_t *)prevCtuTemp->ctuInfo + j) == 2) ? (poc - 1) : prevFrame->m_prevCtuInfoChange[i * maxNum8x8Partitions + j];
+ }
+ }
+ copied = true;
+ curFrame->m_copied.trigger();
+ }
+ else
+ {
+ FrameEncoder* curEncoder;
+ for (int i = 0; i < m_param->frameNumThreads; i++)
+ {
+ curEncoder = m_frameEncoder[i];
+ curFrame = curEncoder->m_frame;
+ if (curFrame)
+ {
+ if (poc == curFrame->m_poc)
+ {
+ if (!curFrame->m_ctuInfo)
+ CHECKED_MALLOC(curFrame->m_ctuInfo, x265_ctu_info_t*, 1);
+ CHECKED_MALLOC(*curFrame->m_ctuInfo, x265_ctu_info_t, numCUsInFrame);
+ CHECKED_MALLOC_ZERO(curFrame->m_prevCtuInfoChange, int, numCUsInFrame * maxNum8x8Partitions);
+ for (uint32_t l = 0; l < numCUsInFrame; l++)
+ {
+ ctuTemp = *curFrame->m_ctuInfo + l;
+ CHECKED_MALLOC(frameCTU, int32_t, maxNum8x8Partitions);
+ ctuTemp->ctuInfo = (int32_t*)frameCTU;
+ ctuTemp->ctuAddress = frameCtuInfo[l]->ctuAddress;
+ memcpy(ctuTemp->ctuPartitions, frameCtuInfo[l]->ctuPartitions, sizeof(int32_t) * maxNum8x8Partitions);
+ memcpy(ctuTemp->ctuInfo, frameCtuInfo[l]->ctuInfo, sizeof(int32_t) * maxNum8x8Partitions);
+ if (prevFrame && curFrame->m_poc > 1)
+ {
+ prevCtuTemp = *prevFrame->m_ctuInfo + l;
+ for (uint32_t j = 0; j < maxNum8x8Partitions; j++)
+ curFrame->m_prevCtuInfoChange[l * maxNum8x8Partitions + j] = (*((int32_t *)prevCtuTemp->ctuInfo + j) == CTU_INFO_CHANGE) ? (poc - 1) : prevFrame->m_prevCtuInfoChange[l * maxNum8x8Partitions + j];
+ }
+ }
+ copied = true;
+ curFrame->m_copied.trigger();
+ break;
+ }
+ }
+ }
+ }
+ } while (!copied);
+ return;
+fail:
+ for (uint32_t i = 0; i < numCUsInFrame; i++)
+ {
+ X265_FREE((*curFrame->m_ctuInfo + i)->ctuInfo);
+ (*curFrame->m_ctuInfo + i)->ctuInfo = NULL;
+ }
+ X265_FREE(*curFrame->m_ctuInfo);
+ *(curFrame->m_ctuInfo) = NULL;
+ X265_FREE(curFrame->m_ctuInfo);
+ curFrame->m_ctuInfo = NULL;
+ X265_FREE(curFrame->m_prevCtuInfoChange);
+ curFrame->m_prevCtuInfoChange = NULL;
+}
+
void EncStats::addPsnr(double psnrY, double psnrU, double psnrV)
{
m_psnrSumY += psnrY;
@@ -1286,7 +1415,7 @@ void Encoder::printSummary()
/* Summarize stats from all frame encoders */
CUStats cuStats;
for (int i = 0; i < m_param->frameNumThreads; i++)
- cuStats.accumulate(m_frameEncoder[i]->m_cuStats);
+ cuStats.accumulate(m_frameEncoder[i]->m_cuStats, *m_param);
if (!cuStats.totalCTUTime)
return;
@@ -1307,7 +1436,7 @@ void Encoder::printSummary()
int64_t interRDOTotalTime = 0, intraRDOTotalTime = 0;
uint64_t interRDOTotalCount = 0, intraRDOTotalCount = 0;
- for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+ for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
{
interRDOTotalTime += cuStats.interRDOElapsedTime[i];
intraRDOTotalTime += cuStats.intraRDOElapsedTime[i];
@@ -1417,7 +1546,7 @@ void Encoder::printSummary()
}
x265_log(m_param, X265_LOG_INFO, "CU: " X265_LL " %dX%d CTUs compressed in %.3lf seconds, %.3lf CTUs per worker-second\n",
- cuStats.totalCTUs, g_maxCUSize, g_maxCUSize,
+ cuStats.totalCTUs, m_param->maxCUSize, m_param->maxCUSize,
ELAPSED_SEC(totalWorkerTime),
cuStats.totalCTUs / ELAPSED_SEC(totalWorkerTime));
@@ -1578,6 +1707,8 @@ void Encoder::finishFrameStats(Frame* curFrame, FrameEncoder *curEncoder, x265_f
frameStats->qp = curEncData.m_avgQpAq;
frameStats->bits = bits;
frameStats->bScenecut = curFrame->m_lowres.bScenecut;
+ if (m_param->csvLogLevel >= 2)
+ frameStats->ipCostRatio = curFrame->m_lowres.ipCostRatio;
frameStats->bufferFill = m_rateControl->m_bufferFillActual;
frameStats->frameLatency = inPoc - poc;
if (m_param->rc.rateControlMode == X265_RC_CRF)
@@ -1602,35 +1733,83 @@ void Encoder::finishFrameStats(Frame* curFrame, FrameEncoder *curEncoder, x265_f
#define ELAPSED_MSEC(start, end) (((double)(end) - (start)) / 1000)
- frameStats->decideWaitTime = ELAPSED_MSEC(0, curEncoder->m_slicetypeWaitTime);
- frameStats->row0WaitTime = ELAPSED_MSEC(curEncoder->m_startCompressTime, curEncoder->m_row0WaitTime);
- frameStats->wallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_endCompressTime);
- frameStats->refWaitWallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_allRowsAvailableTime);
- frameStats->totalCTUTime = ELAPSED_MSEC(0, curEncoder->m_totalWorkerElapsedTime);
- frameStats->stallTime = ELAPSED_MSEC(0, curEncoder->m_totalNoWorkerTime);
- frameStats->totalFrameTime = ELAPSED_MSEC(curFrame->m_encodeStartTime, x265_mdate());
- if (curEncoder->m_totalActiveWorkerCount)
- frameStats->avgWPP = (double)curEncoder->m_totalActiveWorkerCount / curEncoder->m_activeWorkerCountSamples;
- else
- frameStats->avgWPP = 1;
- frameStats->countRowBlocks = curEncoder->m_countRowBlocks;
-
- frameStats->cuStats.percentIntraNxN = curFrame->m_encData->m_frameStats.percentIntraNxN;
- frameStats->avgChromaDistortion = curFrame->m_encData->m_frameStats.avgChromaDistortion;
- frameStats->avgLumaDistortion = curFrame->m_encData->m_frameStats.avgLumaDistortion;
- frameStats->avgPsyEnergy = curFrame->m_encData->m_frameStats.avgPsyEnergy;
- frameStats->avgResEnergy = curFrame->m_encData->m_frameStats.avgResEnergy;
- frameStats->avgLumaLevel = curFrame->m_fencPic->m_avgLumaLevel;
- frameStats->maxLumaLevel = curFrame->m_fencPic->m_maxLumaLevel;
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
- {
- frameStats->cuStats.percentSkipCu[depth] = curFrame->m_encData->m_frameStats.percentSkipCu[depth];
- frameStats->cuStats.percentMergeCu[depth] = curFrame->m_encData->m_frameStats.percentMergeCu[depth];
- frameStats->cuStats.percentInterDistribution[depth][0] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][0];
- frameStats->cuStats.percentInterDistribution[depth][1] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][1];
- frameStats->cuStats.percentInterDistribution[depth][2] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][2];
- for (int n = 0; n < INTRA_MODES; n++)
- frameStats->cuStats.percentIntraDistribution[depth][n] = curFrame->m_encData->m_frameStats.percentIntraDistribution[depth][n];
+ frameStats->maxLumaLevel = curFrame->m_fencPic->m_maxLumaLevel;
+ frameStats->minLumaLevel = curFrame->m_fencPic->m_minLumaLevel;
+ frameStats->avgLumaLevel = curFrame->m_fencPic->m_avgLumaLevel;
+
+ if (m_param->csvLogLevel >= 2)
+ {
+ frameStats->decideWaitTime = ELAPSED_MSEC(0, curEncoder->m_slicetypeWaitTime);
+ frameStats->row0WaitTime = ELAPSED_MSEC(curEncoder->m_startCompressTime, curEncoder->m_row0WaitTime);
+ frameStats->wallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_endCompressTime);
+ frameStats->refWaitWallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_allRowsAvailableTime);
+ frameStats->totalCTUTime = ELAPSED_MSEC(0, curEncoder->m_totalWorkerElapsedTime);
+ frameStats->stallTime = ELAPSED_MSEC(0, curEncoder->m_totalNoWorkerTime);
+ frameStats->totalFrameTime = ELAPSED_MSEC(curFrame->m_encodeStartTime, x265_mdate());
+ if (curEncoder->m_totalActiveWorkerCount)
+ frameStats->avgWPP = (double)curEncoder->m_totalActiveWorkerCount / curEncoder->m_activeWorkerCountSamples;
+ else
+ frameStats->avgWPP = 1;
+ frameStats->countRowBlocks = curEncoder->m_countRowBlocks;
+
+ frameStats->avgChromaDistortion = curFrame->m_encData->m_frameStats.avgChromaDistortion;
+ frameStats->avgLumaDistortion = curFrame->m_encData->m_frameStats.avgLumaDistortion;
+ frameStats->avgPsyEnergy = curFrame->m_encData->m_frameStats.avgPsyEnergy;
+ frameStats->avgResEnergy = curFrame->m_encData->m_frameStats.avgResEnergy;
+
+ frameStats->maxChromaULevel = curFrame->m_fencPic->m_maxChromaULevel;
+ frameStats->minChromaULevel = curFrame->m_fencPic->m_minChromaULevel;
+ frameStats->avgChromaULevel = curFrame->m_fencPic->m_avgChromaULevel;
+
+ frameStats->maxChromaVLevel = curFrame->m_fencPic->m_maxChromaVLevel;
+ frameStats->minChromaVLevel = curFrame->m_fencPic->m_minChromaVLevel;
+ frameStats->avgChromaVLevel = curFrame->m_fencPic->m_avgChromaVLevel;
+
+ if (curFrame->m_encData->m_frameStats.totalPu[4] == 0)
+ frameStats->puStats.percentNxN = 0;
+ else
+ frameStats->puStats.percentNxN = (double)(curFrame->m_encData->m_frameStats.cnt4x4 / (double)curFrame->m_encData->m_frameStats.totalPu[4]) * 100;
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
+ {
+ if (curFrame->m_encData->m_frameStats.totalPu[depth] == 0)
+ {
+ frameStats->puStats.percentSkipPu[depth] = 0;
+ frameStats->puStats.percentIntraPu[depth] = 0;
+ frameStats->puStats.percentAmpPu[depth] = 0;
+ for (int i = 0; i < INTER_MODES - 1; i++)
+ {
+ frameStats->puStats.percentInterPu[depth][i] = 0;
+ frameStats->puStats.percentMergePu[depth][i] = 0;
+ }
+ }
+ else
+ {
+ frameStats->puStats.percentSkipPu[depth] = (double)(curFrame->m_encData->m_frameStats.cntSkipPu[depth] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
+ frameStats->puStats.percentIntraPu[depth] = (double)(curFrame->m_encData->m_frameStats.cntIntraPu[depth] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
+ frameStats->puStats.percentAmpPu[depth] = (double)(curFrame->m_encData->m_frameStats.cntAmp[depth] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
+ for (int i = 0; i < INTER_MODES - 1; i++)
+ {
+ frameStats->puStats.percentInterPu[depth][i] = (double)(curFrame->m_encData->m_frameStats.cntInterPu[depth][i] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
+ frameStats->puStats.percentMergePu[depth][i] = (double)(curFrame->m_encData->m_frameStats.cntMergePu[depth][i] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
+ }
+ }
+ }
+ }
+
+ if (m_param->csvLogLevel >= 1)
+ {
+ frameStats->cuStats.percentIntraNxN = curFrame->m_encData->m_frameStats.percentIntraNxN;
+
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
+ {
+ frameStats->cuStats.percentSkipCu[depth] = curFrame->m_encData->m_frameStats.percentSkipCu[depth];
+ frameStats->cuStats.percentMergeCu[depth] = curFrame->m_encData->m_frameStats.percentMergeCu[depth];
+ frameStats->cuStats.percentInterDistribution[depth][0] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][0];
+ frameStats->cuStats.percentInterDistribution[depth][1] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][1];
+ frameStats->cuStats.percentInterDistribution[depth][2] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][2];
+ for (int n = 0; n < INTRA_MODES; n++)
+ frameStats->cuStats.percentIntraDistribution[depth][n] = curFrame->m_encData->m_frameStats.percentIntraDistribution[depth][n];
+ }
}
}
}
@@ -1803,16 +1982,16 @@ void Encoder::initSPS(SPS *sps)
sps->chromaFormatIdc = m_param->internalCsp;
sps->picWidthInLumaSamples = m_param->sourceWidth;
sps->picHeightInLumaSamples = m_param->sourceHeight;
- sps->numCuInWidth = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
- sps->numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
+ sps->numCuInWidth = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
+ sps->numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
sps->numCUsInFrame = sps->numCuInWidth * sps->numCuInHeight;
- sps->numPartitions = NUM_4x4_PARTITIONS;
- sps->numPartInCUSize = 1 << g_unitSizeDepth;
+ sps->numPartitions = m_param->num4x4Partitions;
+ sps->numPartInCUSize = 1 << m_param->unitSizeDepth;
- sps->log2MinCodingBlockSize = g_maxLog2CUSize - g_maxCUDepth;
- sps->log2DiffMaxMinCodingBlockSize = g_maxCUDepth;
+ sps->log2MinCodingBlockSize = m_param->maxLog2CUSize - m_param->maxCUDepth;
+ sps->log2DiffMaxMinCodingBlockSize = m_param->maxCUDepth;
uint32_t maxLog2TUSize = (uint32_t)g_log2Size[m_param->maxTUSize];
- sps->quadtreeTULog2MaxSize = X265_MIN(g_maxLog2CUSize, maxLog2TUSize);
+ sps->quadtreeTULog2MaxSize = X265_MIN((uint32_t)m_param->maxLog2CUSize, maxLog2TUSize);
sps->quadtreeTULog2MinSize = 2;
sps->quadtreeTUMaxDepthInter = m_param->tuQTMaxInterDepth;
sps->quadtreeTUMaxDepthIntra = m_param->tuQTMaxIntraDepth;
@@ -1820,7 +1999,7 @@ void Encoder::initSPS(SPS *sps)
sps->bUseSAO = m_param->bEnableSAO;
sps->bUseAMP = m_param->bEnableAMP;
- sps->maxAMPDepth = m_param->bEnableAMP ? g_maxCUDepth : 0;
+ sps->maxAMPDepth = m_param->bEnableAMP ? m_param->maxCUDepth : 0;
sps->maxTempSubLayers = m_param->bEnableTemporalSubLayers ? 2 : 1;
sps->maxDecPicBuffering = m_vps.maxDecPicBuffering;
@@ -2034,7 +2213,7 @@ void Encoder::configure(x265_param *p)
p->lookaheadDepth = p->totalFrames;
if (p->bIntraRefresh)
{
- int numCuInWidth = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
+ int numCuInWidth = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
if (p->maxNumReferences > 1)
{
x265_log(p, X265_LOG_WARNING, "Max References > 1 + intra-refresh is not supported , setting max num references = 1\n");
@@ -2070,23 +2249,68 @@ void Encoder::configure(x265_param *p)
p->rc.rfConstantMin = 0;
}
- if (p->analysisMode && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
+ if (p->analysisReuseMode && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
{
x265_log(p, X265_LOG_WARNING, "Analysis load/save options incompatible with pmode/pme, Disabling pmode/pme\n");
p->bDistributeMotionEstimation = p->bDistributeModeAnalysis = 0;
}
- if (p->analysisMode && p->rc.cuTree)
+ if (p->analysisReuseMode && p->rc.cuTree)
{
x265_log(p, X265_LOG_WARNING, "Analysis load/save options works only with cu-tree off, Disabling cu-tree\n");
p->rc.cuTree = 0;
}
- if (p->analysisMode && (p->analysisMultiPassRefine || p->analysisMultiPassDistortion))
+ if (p->analysisReuseMode && (p->analysisMultiPassRefine || p->analysisMultiPassDistortion))
{
x265_log(p, X265_LOG_WARNING, "Cannot use Analysis load/save option and multi-pass-opt-analysis/multi-pass-opt-distortion together,"
"Disabling Analysis load/save and multi-pass-opt-analysis/multi-pass-opt-distortion\n");
- p->analysisMode = p->analysisMultiPassRefine = p->analysisMultiPassDistortion = 0;
+ p->analysisReuseMode = p->analysisMultiPassRefine = p->analysisMultiPassDistortion = 0;
+ }
+ if (p->scaleFactor)
+ {
+ if (p->scaleFactor == 1)
+ {
+ p->scaleFactor = 0;
+ }
+ else if (!p->analysisReuseMode || p->analysisReuseLevel < 10)
+ {
+ x265_log(p, X265_LOG_WARNING, "Input scaling works with analysis-reuse-mode, analysis-reuse-level 10. Disabling scale-factor.\n");
+ p->scaleFactor = 0;
+ }
+ }
+
+ if (p->intraRefine)
+ {
+ if (p->analysisReuseMode!= X265_ANALYSIS_LOAD || p->analysisReuseLevel < 10 || !p->scaleFactor)
+ {
+ x265_log(p, X265_LOG_WARNING, "Intra refinement requires analysis load, analysis-reuse-level 10, scale factor. Disabling intra refine.\n");
+ p->intraRefine = 0;
+ }
+ }
+
+ if (p->interRefine)
+ {
+ if (p->analysisReuseMode != X265_ANALYSIS_LOAD || p->analysisReuseLevel < 10 || !p->scaleFactor)
+ {
+ x265_log(p, X265_LOG_WARNING, "Inter refinement requires analysis load, analysis-reuse-level 10, scale factor. Disabling inter refine.\n");
+ p->interRefine = 0;
+ }
+ }
+
+ if (p->limitTU && p->interRefine)
+ {
+ x265_log(p, X265_LOG_WARNING, "Inter refinement does not support limitTU. Disabling limitTU.\n");
+ p->limitTU = 0;
+ }
+
+ if (p->mvRefine)
+ {
+ if (p->analysisReuseMode != X265_ANALYSIS_LOAD || p->analysisReuseLevel < 10 || !p->scaleFactor)
+ {
+ x265_log(p, X265_LOG_WARNING, "MV refinement requires analysis load, analysis-reuse-level 10, scale factor. Disabling MV refine.\n");
+ p->mvRefine = 0;
+ }
}
if ((p->analysisMultiPassRefine || p->analysisMultiPassDistortion) && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
@@ -2177,9 +2401,17 @@ void Encoder::configure(x265_param *p)
m_conformanceWindow.topOffset = 0;
m_conformanceWindow.bottomOffset = 0;
m_conformanceWindow.leftOffset = 0;
-
/* set pad size if width is not multiple of the minimum CU size */
- if (p->sourceWidth & (p->minCUSize - 1))
+ if (p->scaleFactor == 2 && ((p->sourceWidth / 2) & (p->minCUSize - 1)) && p->analysisReuseMode == X265_ANALYSIS_LOAD)
+ {
+ uint32_t rem = (p->sourceWidth / 2) & (p->minCUSize - 1);
+ uint32_t padsize = p->minCUSize - rem;
+ p->sourceWidth += padsize * 2;
+
+ m_conformanceWindow.bEnabled = true;
+ m_conformanceWindow.rightOffset = padsize * 2;
+ }
+ else if(p->sourceWidth & (p->minCUSize - 1))
{
uint32_t rem = p->sourceWidth & (p->minCUSize - 1);
uint32_t padsize = p->minCUSize - rem;
@@ -2228,7 +2460,7 @@ void Encoder::configure(x265_param *p)
p->dynamicRd = 0;
x265_log(p, X265_LOG_WARNING, "Dynamic-rd disabled, requires RD <= 4, VBV and aq-mode enabled\n");
}
-#ifdef ENABLE_DYNAMIC_HDR10
+#ifdef ENABLE_HDR10_PLUS
if (m_param->bDhdr10opt && m_param->toneMapFile == NULL)
{
x265_log(p, X265_LOG_WARNING, "Disabling dhdr10-opt. dhdr10-info must be enabled.\n");
@@ -2252,7 +2484,7 @@ void Encoder::configure(x265_param *p)
#else
if (m_param->toneMapFile)
{
- x265_log(p, X265_LOG_WARNING, "--dhdr10-info disabled. Enable dynamic HDR in cmake.\n");
+ x265_log(p, X265_LOG_WARNING, "--dhdr10-info disabled. Enable HDR10_PLUS in cmake.\n");
m_bToneMap = 0;
m_param->toneMapFile = NULL;
}
@@ -2358,9 +2590,16 @@ void Encoder::configure(x265_param *p)
x265_log(p, X265_LOG_ERROR, "uhd-bd: Disabled\n");
}
}
-
/* set pad size if height is not multiple of the minimum CU size */
- if (p->sourceHeight & (p->minCUSize - 1))
+ if (p->scaleFactor == 2 && ((p->sourceHeight / 2) & (p->minCUSize - 1)) && p->analysisReuseMode == X265_ANALYSIS_LOAD)
+ {
+ uint32_t rem = (p->sourceHeight / 2) & (p->minCUSize - 1);
+ uint32_t padsize = p->minCUSize - rem;
+ p->sourceHeight += padsize * 2;
+ m_conformanceWindow.bEnabled = true;
+ m_conformanceWindow.bottomOffset = padsize * 2;
+ }
+ else if(p->sourceHeight & (p->minCUSize - 1))
{
uint32_t rem = p->sourceHeight & (p->minCUSize - 1);
uint32_t padsize = p->minCUSize - rem;
@@ -2372,9 +2611,6 @@ void Encoder::configure(x265_param *p)
if (p->bLogCuStats)
x265_log(p, X265_LOG_WARNING, "--cu-stats option is now deprecated\n");
- if (p->csvfn)
- x265_log(p, X265_LOG_WARNING, "libx265 no longer supports CSV file statistics\n");
-
if (p->log2MaxPocLsb < 4)
{
x265_log(p, X265_LOG_WARNING, "maximum of the picture order count can not be less than 4\n");
@@ -2406,6 +2642,20 @@ void Encoder::configure(x265_param *p)
p->bHDROpt = 0;
}
}
+
+ if (m_param->toneMapFile || p->bHDROpt || p->bEmitHDRSEI)
+ {
+ if (!p->bRepeatHeaders)
+ {
+ p->bRepeatHeaders = 1;
+ x265_log(p, X265_LOG_WARNING, "Turning on repeat-headers for HDR compatibility\n");
+ }
+ }
+
+ p->maxLog2CUSize = g_log2Size[p->maxCUSize];
+ p->maxCUDepth = p->maxLog2CUSize - g_log2Size[p->minCUSize];
+ p->unitSizeDepth = p->maxLog2CUSize - LOG2_UNIT_SIZE;
+ p->num4x4Partitions = (1U << (p->unitSizeDepth << 1));
}
void Encoder::allocAnalysis(x265_analysis_data* analysis)
@@ -2414,7 +2664,7 @@ void Encoder::allocAnalysis(x265_analysis_data* analysis)
analysis->interData = analysis->intraData = NULL;
if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
{
- if (m_param->analysisRefineLevel < 2)
+ if (m_param->analysisReuseLevel < 2)
return;
analysis_intra_data *intraData = (analysis_intra_data*)analysis->intraData;
@@ -2430,27 +2680,27 @@ void Encoder::allocAnalysis(x265_analysis_data* analysis)
int numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
uint32_t numPlanes = m_param->internalCsp == X265_CSP_I400 ? 1 : 3;
CHECKED_MALLOC_ZERO(analysis->wt, WeightParam, numPlanes * numDir);
- if (m_param->analysisRefineLevel < 2)
+ if (m_param->analysisReuseLevel < 2)
return;
analysis_inter_data *interData = (analysis_inter_data*)analysis->interData;
CHECKED_MALLOC_ZERO(interData, analysis_inter_data, 1);
CHECKED_MALLOC(interData->depth, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
CHECKED_MALLOC(interData->modes, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
- if (m_param->analysisRefineLevel > 4)
+ if (m_param->analysisReuseLevel > 4)
{
CHECKED_MALLOC(interData->partSize, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
CHECKED_MALLOC(interData->mergeFlag, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
}
- if (m_param->analysisRefineLevel == 10)
+ if (m_param->analysisReuseLevel == 10)
{
CHECKED_MALLOC(interData->interDir, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
for (int dir = 0; dir < numDir; dir++)
{
CHECKED_MALLOC(interData->mvpIdx[dir], uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
CHECKED_MALLOC(interData->refIdx[dir], int8_t, analysis->numPartitions * analysis->numCUsInFrame);
- CHECKED_MALLOC(interData->mv[dir], MV, analysis->numPartitions * analysis->numCUsInFrame);
+ CHECKED_MALLOC(interData->mv[dir], MV, analysis->numPartitions * analysis->numCUsInFrame);
}
/* Allocate intra in inter */
@@ -2480,51 +2730,56 @@ void Encoder::freeAnalysis(x265_analysis_data* analysis)
/* Early exit freeing weights alone if level is 1 (when there is no analysis inter/intra) */
if (analysis->sliceType > X265_TYPE_I && analysis->wt)
X265_FREE(analysis->wt);
- if (m_param->analysisRefineLevel < 2)
+ if (m_param->analysisReuseLevel < 2)
return;
- if (analysis->intraData)
+ if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
{
- if (m_param->analysisRefineLevel < 2)
- return;
-
- X265_FREE(((analysis_intra_data*)analysis->intraData)->depth);
- X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
- X265_FREE(((analysis_intra_data*)analysis->intraData)->partSizes);
- X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
- X265_FREE(analysis->intraData);
+ if (analysis->intraData)
+ {
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->depth);
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->partSizes);
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
+ X265_FREE(analysis->intraData);
+ analysis->intraData = NULL;
+ }
}
- else if (analysis->interData)
+ else
{
- X265_FREE(((analysis_inter_data*)analysis->interData)->depth);
- X265_FREE(((analysis_inter_data*)analysis->interData)->modes);
- if (m_param->analysisRefineLevel > 4)
+ if (analysis->intraData)
{
- X265_FREE(((analysis_inter_data*)analysis->interData)->mergeFlag);
- X265_FREE(((analysis_inter_data*)analysis->interData)->partSize);
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
+ X265_FREE(analysis->intraData);
+ analysis->intraData = NULL;
}
-
- if (m_param->analysisRefineLevel == 10)
+ if (analysis->interData)
{
- X265_FREE(((analysis_inter_data*)analysis->interData)->interDir);
- int numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
- for (int dir = 0; dir < numDir; dir++)
+ X265_FREE(((analysis_inter_data*)analysis->interData)->depth);
+ X265_FREE(((analysis_inter_data*)analysis->interData)->modes);
+ if (m_param->analysisReuseLevel > 4)
{
- X265_FREE(((analysis_inter_data*)analysis->interData)->mvpIdx[dir]);
- X265_FREE(((analysis_inter_data*)analysis->interData)->refIdx[dir]);
- X265_FREE(((analysis_inter_data*)analysis->interData)->mv[dir]);
+ X265_FREE(((analysis_inter_data*)analysis->interData)->mergeFlag);
+ X265_FREE(((analysis_inter_data*)analysis->interData)->partSize);
}
- if (analysis->sliceType == P_SLICE || m_param->bIntraInBFrames)
+ if (m_param->analysisReuseLevel == 10)
{
- X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
- X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
- X265_FREE(analysis->intraData);
+ X265_FREE(((analysis_inter_data*)analysis->interData)->interDir);
+ int numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
+ for (int dir = 0; dir < numDir; dir++)
+ {
+ X265_FREE(((analysis_inter_data*)analysis->interData)->mvpIdx[dir]);
+ X265_FREE(((analysis_inter_data*)analysis->interData)->refIdx[dir]);
+ X265_FREE(((analysis_inter_data*)analysis->interData)->mv[dir]);
+ }
}
- }
- else
- X265_FREE(((analysis_inter_data*)analysis->interData)->ref);
+ else
+ X265_FREE(((analysis_inter_data*)analysis->interData)->ref);
- X265_FREE(analysis->interData);
+ X265_FREE(analysis->interData);
+ analysis->interData = NULL;
+ }
}
}
@@ -2532,13 +2787,13 @@ void Encoder::allocAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType)
{
analysis->analysisFramedata = NULL;
analysis2PassFrameData *analysisFrameData = (analysis2PassFrameData*)analysis->analysisFramedata;
- uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
- uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
uint32_t numCUsInFrame = widthInCU * heightInCU;
CHECKED_MALLOC_ZERO(analysisFrameData, analysis2PassFrameData, 1);
- CHECKED_MALLOC_ZERO(analysisFrameData->depth, uint8_t, NUM_4x4_PARTITIONS * numCUsInFrame);
- CHECKED_MALLOC_ZERO(analysisFrameData->distortion, sse_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+ CHECKED_MALLOC_ZERO(analysisFrameData->depth, uint8_t, m_param->num4x4Partitions * numCUsInFrame);
+ CHECKED_MALLOC_ZERO(analysisFrameData->distortion, sse_t, m_param->num4x4Partitions * numCUsInFrame);
if (m_param->rc.bStatRead)
{
CHECKED_MALLOC_ZERO(analysisFrameData->ctuDistortion, sse_t, numCUsInFrame);
@@ -2548,13 +2803,13 @@ void Encoder::allocAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType)
}
if (!IS_X265_TYPE_I(sliceType))
{
- CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[0], MV, NUM_4x4_PARTITIONS * numCUsInFrame);
- CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[1], MV, NUM_4x4_PARTITIONS * numCUsInFrame);
- CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[0], int, NUM_4x4_PARTITIONS * numCUsInFrame);
- CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[1], int, NUM_4x4_PARTITIONS * numCUsInFrame);
- CHECKED_MALLOC_ZERO(analysisFrameData->ref[0], int32_t, NUM_4x4_PARTITIONS * numCUsInFrame);
- CHECKED_MALLOC_ZERO(analysisFrameData->ref[1], int32_t, NUM_4x4_PARTITIONS * numCUsInFrame);
- CHECKED_MALLOC(analysisFrameData->modes, uint8_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+ CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[0], MV, m_param->num4x4Partitions * numCUsInFrame);
+ CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[1], MV, m_param->num4x4Partitions * numCUsInFrame);
+ CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[0], int, m_param->num4x4Partitions * numCUsInFrame);
+ CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[1], int, m_param->num4x4Partitions * numCUsInFrame);
+ CHECKED_MALLOC_ZERO(analysisFrameData->ref[0], int32_t, m_param->num4x4Partitions * numCUsInFrame);
+ CHECKED_MALLOC_ZERO(analysisFrameData->ref[1], int32_t, m_param->num4x4Partitions * numCUsInFrame);
+ CHECKED_MALLOC(analysisFrameData->modes, uint8_t, m_param->num4x4Partitions * numCUsInFrame);
}
analysis->analysisFramedata = analysisFrameData;
@@ -2593,11 +2848,15 @@ void Encoder::freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType)
}
}
-void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
+void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x265_picture* picIn)
{
-#define X265_FREAD(val, size, readSize, fileOffset)\
- if (fread(val, size, readSize, fileOffset) != readSize)\
+#define X265_FREAD(val, size, readSize, fileOffset, src)\
+ if (!m_param->bUseAnalysisFile)\
+ {\
+ memcpy(val, src, (size * readSize));\
+ }\
+ else if (fread(val, size, readSize, fileOffset) != readSize)\
{\
x265_log(NULL, X265_LOG_ERROR, "Error reading analysis data\n");\
freeAnalysis(analysis);\
@@ -2610,67 +2869,98 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
uint32_t depthBytes = 0;
fseeko(m_analysisFile, totalConsumedBytes, SEEK_SET);
- int poc; uint32_t frameRecordSize;
- X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile);
- X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile);
- X265_FREAD(&poc, sizeof(int), 1, m_analysisFile);
+ const x265_analysis_data *picData = &(picIn->analysisData);
+ analysis_intra_data *intraPic = (analysis_intra_data *)picData->intraData;
+ analysis_inter_data *interPic = (analysis_inter_data *)picData->interData;
- uint64_t currentOffset = totalConsumedBytes;
+ int poc; uint32_t frameRecordSize;
+ X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile, &(picData->frameRecordSize));
+ X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile, &(picData->depthBytes));
+ X265_FREAD(&poc, sizeof(int), 1, m_analysisFile, &(picData->poc));
- /* Seeking to the right frame Record */
- while (poc != curPoc && !feof(m_analysisFile))
+ if (m_param->bUseAnalysisFile)
{
- currentOffset += frameRecordSize;
- fseeko(m_analysisFile, currentOffset, SEEK_SET);
- X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile);
- X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile);
- X265_FREAD(&poc, sizeof(int), 1, m_analysisFile);
- }
+ uint64_t currentOffset = totalConsumedBytes;
- if (poc != curPoc || feof(m_analysisFile))
- {
- x265_log(NULL, X265_LOG_WARNING, "Error reading analysis data: Cannot find POC %d\n", curPoc);
- freeAnalysis(analysis);
- return;
+ /* Seeking to the right frame Record */
+ while (poc != curPoc && !feof(m_analysisFile))
+ {
+ currentOffset += frameRecordSize;
+ fseeko(m_analysisFile, currentOffset, SEEK_SET);
+ X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile, &(picData->frameRecordSize));
+ X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile, &(picData->depthBytes));
+ X265_FREAD(&poc, sizeof(int), 1, m_analysisFile, &(picData->poc));
+ }
+ if (poc != curPoc || feof(m_analysisFile))
+ {
+ x265_log(NULL, X265_LOG_WARNING, "Error reading analysis data: Cannot find POC %d\n", curPoc);
+ freeAnalysis(analysis);
+ return;
+ }
}
/* Now arrived at the right frame, read the record */
analysis->poc = poc;
analysis->frameRecordSize = frameRecordSize;
- X265_FREAD(&analysis->sliceType, sizeof(int), 1, m_analysisFile);
- X265_FREAD(&analysis->bScenecut, sizeof(int), 1, m_analysisFile);
- X265_FREAD(&analysis->satdCost, sizeof(int64_t), 1, m_analysisFile);
- X265_FREAD(&analysis->numCUsInFrame, sizeof(int), 1, m_analysisFile);
- X265_FREAD(&analysis->numPartitions, sizeof(int), 1, m_analysisFile);
+ X265_FREAD(&analysis->sliceType, sizeof(int), 1, m_analysisFile, &(picData->sliceType));
+ X265_FREAD(&analysis->bScenecut, sizeof(int), 1, m_analysisFile, &(picData->bScenecut));
+ X265_FREAD(&analysis->satdCost, sizeof(int64_t), 1, m_analysisFile, &(picData->satdCost));
+ X265_FREAD(&analysis->numCUsInFrame, sizeof(int), 1, m_analysisFile, &(picData->numCUsInFrame));
+ X265_FREAD(&analysis->numPartitions, sizeof(int), 1, m_analysisFile, &(picData->numPartitions));
+ int scaledNumPartition = analysis->numPartitions;
+ int factor = 1 << m_param->scaleFactor;
+
+ if (m_param->scaleFactor)
+ analysis->numPartitions *= factor;
/* Memory is allocated for inter and intra analysis data based on the slicetype */
allocAnalysis(analysis);
if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
{
- analysis->sliceType = X265_TYPE_I;
- if (m_param->analysisRefineLevel < 2)
+ if (m_param->analysisReuseLevel < 2)
return;
uint8_t *tempBuf = NULL, *depthBuf = NULL, *modeBuf = NULL, *partSizes = NULL;
tempBuf = X265_MALLOC(uint8_t, depthBytes * 3);
- X265_FREAD(tempBuf, sizeof(uint8_t), depthBytes * 3, m_analysisFile);
-
depthBuf = tempBuf;
modeBuf = tempBuf + depthBytes;
partSizes = tempBuf + 2 * depthBytes;
+ X265_FREAD(depthBuf, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->depth);
+ X265_FREAD(modeBuf, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->chromaModes);
+ X265_FREAD(partSizes, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->partSizes);
+
size_t count = 0;
for (uint32_t d = 0; d < depthBytes; d++)
{
int bytes = analysis->numPartitions >> (depthBuf[d] * 2);
+ if (m_param->scaleFactor)
+ {
+ if (depthBuf[d] == 0)
+ depthBuf[d] = 1;
+ if (partSizes[d] == SIZE_NxN)
+ partSizes[d] = SIZE_2Nx2N;
+ }
memset(&((analysis_intra_data *)analysis->intraData)->depth[count], depthBuf[d], bytes);
memset(&((analysis_intra_data *)analysis->intraData)->chromaModes[count], modeBuf[d], bytes);
memset(&((analysis_intra_data *)analysis->intraData)->partSizes[count], partSizes[d], bytes);
count += bytes;
}
- X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile);
+
+ if (!m_param->scaleFactor)
+ {
+ X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile, intraPic->modes);
+ }
+ else
+ {
+ uint8_t *tempLumaBuf = X265_MALLOC(uint8_t, analysis->numCUsInFrame * scaledNumPartition);
+ X265_FREAD(tempLumaBuf, sizeof(uint8_t), analysis->numCUsInFrame * scaledNumPartition, m_analysisFile, intraPic->modes);
+ for (uint32_t ctu32Idx = 0, cnt = 0; ctu32Idx < analysis->numCUsInFrame * scaledNumPartition; ctu32Idx++, cnt += factor)
+ memset(&((analysis_intra_data *)analysis->intraData)->modes[cnt], tempLumaBuf[ctu32Idx], factor);
+ X265_FREE(tempLumaBuf);
+ }
X265_FREE(tempBuf);
consumedBytes += frameRecordSize;
}
@@ -2679,8 +2969,8 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
{
uint32_t numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
uint32_t numPlanes = m_param->internalCsp == X265_CSP_I400 ? 1 : 3;
- X265_FREAD((WeightParam*)analysis->wt, sizeof(WeightParam), numPlanes * numDir, m_analysisFile);
- if (m_param->analysisRefineLevel < 2)
+ X265_FREAD((WeightParam*)analysis->wt, sizeof(WeightParam), numPlanes * numDir, m_analysisFile, (picIn->analysisData.wt));
+ if (m_param->analysisReuseLevel < 2)
return;
uint8_t *tempBuf = NULL, *depthBuf = NULL, *modeBuf = NULL, *partSize = NULL, *mergeFlag = NULL;
@@ -2688,9 +2978,9 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
MV* mv[2];
int8_t* refIdx[2];
- int numBuf = m_param->analysisRefineLevel > 4 ? 4 : 2;
+ int numBuf = m_param->analysisReuseLevel > 4 ? 4 : 2;
bool bIntraInInter = false;
- if (m_param->analysisRefineLevel == 10)
+ if (m_param->analysisReuseLevel == 10)
{
numBuf++;
bIntraInInter = (analysis->sliceType == X265_TYPE_P || m_param->bIntraInBFrames);
@@ -2698,26 +2988,36 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
}
tempBuf = X265_MALLOC(uint8_t, depthBytes * numBuf);
- X265_FREAD(tempBuf, sizeof(uint8_t), depthBytes * numBuf, m_analysisFile);
-
depthBuf = tempBuf;
modeBuf = tempBuf + depthBytes;
- if (m_param->analysisRefineLevel > 4)
+
+ X265_FREAD(depthBuf, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->depth);
+ X265_FREAD(modeBuf, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->modes);
+
+ if (m_param->analysisReuseLevel > 4)
{
partSize = modeBuf + depthBytes;
mergeFlag = partSize + depthBytes;
- if (m_param->analysisRefineLevel == 10)
+ X265_FREAD(partSize, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->partSize);
+ X265_FREAD(mergeFlag, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->mergeFlag);
+
+ if (m_param->analysisReuseLevel == 10)
{
interDir = mergeFlag + depthBytes;
- if (bIntraInInter) chromaDir = interDir + depthBytes;
+ X265_FREAD(interDir, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->interDir);
+ if (bIntraInInter)
+ {
+ chromaDir = interDir + depthBytes;
+ X265_FREAD(chromaDir, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->chromaModes);
+ }
for (uint32_t i = 0; i < numDir; i++)
{
- mvpIdx[i] = X265_MALLOC(uint8_t, depthBytes * 3);
- X265_FREAD(mvpIdx[i], sizeof(uint8_t), depthBytes, m_analysisFile);
+ mvpIdx[i] = X265_MALLOC(uint8_t, depthBytes);
refIdx[i] = X265_MALLOC(int8_t, depthBytes);
- X265_FREAD(refIdx[i], sizeof(int8_t), depthBytes, m_analysisFile);
mv[i] = X265_MALLOC(MV, depthBytes);
- X265_FREAD(mv[i], sizeof(MV), depthBytes, m_analysisFile);
+ X265_FREAD(mvpIdx[i], sizeof(uint8_t), depthBytes, m_analysisFile, interPic->mvpIdx[i]);
+ X265_FREAD(refIdx[i], sizeof(int8_t), depthBytes, m_analysisFile, interPic->refIdx[i]);
+ X265_FREAD(mv[i], sizeof(MV), depthBytes, m_analysisFile, interPic->mv[i]);
}
}
}
@@ -2726,28 +3026,37 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
for (uint32_t d = 0; d < depthBytes; d++)
{
int bytes = analysis->numPartitions >> (depthBuf[d] * 2);
+ if (m_param->scaleFactor && modeBuf[d] == MODE_INTRA && depthBuf[d] == 0)
+ depthBuf[d] = 1;
memset(&((analysis_inter_data *)analysis->interData)->depth[count], depthBuf[d], bytes);
memset(&((analysis_inter_data *)analysis->interData)->modes[count], modeBuf[d], bytes);
- if (m_param->analysisRefineLevel > 4)
+ if (m_param->analysisReuseLevel > 4)
{
+ if (m_param->scaleFactor && modeBuf[d] == MODE_INTRA && partSize[d] == SIZE_NxN)
+ partSize[d] = SIZE_2Nx2N;
memset(&((analysis_inter_data *)analysis->interData)->partSize[count], partSize[d], bytes);
- int numPU = nbPartsTable[(int)partSize[d]];
+ int numPU = (modeBuf[d] == MODE_INTRA) ? 1 : nbPartsTable[(int)partSize[d]];
for (int pu = 0; pu < numPU; pu++)
{
if (pu) d++;
((analysis_inter_data *)analysis->interData)->mergeFlag[count + pu] = mergeFlag[d];
- if (m_param->analysisRefineLevel == 10)
+ if (m_param->analysisReuseLevel == 10)
{
((analysis_inter_data *)analysis->interData)->interDir[count + pu] = interDir[d];
for (uint32_t i = 0; i < numDir; i++)
{
((analysis_inter_data *)analysis->interData)->mvpIdx[i][count + pu] = mvpIdx[i][d];
((analysis_inter_data *)analysis->interData)->refIdx[i][count + pu] = refIdx[i][d];
+ if (m_param->scaleFactor)
+ {
+ mv[i][d].x *= (int16_t)m_param->scaleFactor;
+ mv[i][d].y *= (int16_t)m_param->scaleFactor;
+ }
memcpy(&((analysis_inter_data *)analysis->interData)->mv[i][count + pu], &mv[i][d], sizeof(MV));
}
}
}
- if (m_param->analysisRefineLevel == 10 && bIntraInInter)
+ if (m_param->analysisReuseLevel == 10 && bIntraInInter)
memset(&((analysis_intra_data *)analysis->intraData)->chromaModes[count], chromaDir[d], bytes);
}
count += bytes;
@@ -2755,7 +3064,7 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
X265_FREE(tempBuf);
- if (m_param->analysisRefineLevel == 10)
+ if (m_param->analysisReuseLevel == 10)
{
for (uint32_t i = 0; i < numDir; i++)
{
@@ -2764,10 +3073,23 @@ void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
X265_FREE(mv[i]);
}
if (bIntraInInter)
- X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile);
+ {
+ if (!m_param->scaleFactor)
+ {
+ X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile, intraPic->modes);
+ }
+ else
+ {
+ uint8_t *tempLumaBuf = X265_MALLOC(uint8_t, analysis->numCUsInFrame * scaledNumPartition);
+ X265_FREAD(tempLumaBuf, sizeof(uint8_t), analysis->numCUsInFrame * scaledNumPartition, m_analysisFile, intraPic->modes);
+ for (uint32_t ctu32Idx = 0, cnt = 0; ctu32Idx < analysis->numCUsInFrame * scaledNumPartition; ctu32Idx++, cnt += factor)
+ memset(&((analysis_intra_data *)analysis->intraData)->modes[cnt], tempLumaBuf[ctu32Idx], factor);
+ X265_FREE(tempLumaBuf);
+ }
+ }
}
else
- X265_FREAD(((analysis_inter_data *)analysis->interData)->ref, sizeof(int32_t), analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFile);
+ X265_FREAD(((analysis_inter_data *)analysis->interData)->ref, sizeof(int32_t), analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFile, interPic->ref);
consumedBytes += frameRecordSize;
if (numDir == 1)
@@ -2789,8 +3111,8 @@ void Encoder::readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int curP
}\
uint32_t depthBytes = 0;
- uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
- uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
uint32_t numCUsInFrame = widthInCU * heightInCU;
int poc; uint32_t frameRecordSize;
@@ -2820,12 +3142,12 @@ void Encoder::readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int curP
double sum = 0, sqrSum = 0;
for (uint32_t d = 0; d < depthBytes; d++)
{
- int bytes = NUM_4x4_PARTITIONS >> (depthBuf[d] * 2);
+ int bytes = m_param->num4x4Partitions >> (depthBuf[d] * 2);
memset(&analysisFrameData->depth[count], depthBuf[d], bytes);
analysisFrameData->distortion[count] = distortionBuf[d];
analysisFrameData->ctuDistortion[ctuCount] += analysisFrameData->distortion[count];
count += bytes;
- if ((count % (size_t)NUM_4x4_PARTITIONS) == 0)
+ if ((count % (unsigned)m_param->num4x4Partitions) == 0)
{
analysisFrameData->scaledDistortion[ctuCount] = X265_LOG2(X265_MAX(analysisFrameData->ctuDistortion[ctuCount], 1));
sum += analysisFrameData->scaledDistortion[ctuCount];
@@ -2873,7 +3195,7 @@ void Encoder::readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int curP
count = 0;
for (uint32_t d = 0; d < depthBytes; d++)
{
- size_t bytes = NUM_4x4_PARTITIONS >> (depthBuf[d] * 2);
+ size_t bytes = m_param->num4x4Partitions >> (depthBuf[d] * 2);
for (int i = 0; i < numDir; i++)
{
for (size_t j = count, k = 0; k < bytes; j++, k++)
@@ -2927,7 +3249,7 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD
analysis->frameRecordSize += sizeof(WeightParam) * numPlanes * numDir;
}
- if (m_param->analysisRefineLevel > 1)
+ if (m_param->analysisReuseLevel > 1)
{
if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
{
@@ -2975,25 +3297,25 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD
interDataCTU->depth[depthBytes] = depth;
predMode = ctu->m_predMode[absPartIdx];
- if (m_param->analysisRefineLevel != 10 && ctu->m_refIdx[1][absPartIdx] != -1)
+ if (m_param->analysisReuseLevel != 10 && ctu->m_refIdx[1][absPartIdx] != -1)
predMode = 4; // used as indiacator if the block is coded as bidir
interDataCTU->modes[depthBytes] = predMode;
- if (m_param->analysisRefineLevel > 4)
+ if (m_param->analysisReuseLevel > 4)
{
partSize = ctu->m_partSize[absPartIdx];
interDataCTU->partSize[depthBytes] = partSize;
/* Store per PU data */
- uint32_t numPU = nbPartsTable[(int)partSize];
+ uint32_t numPU = (predMode == MODE_INTRA) ? 1 : nbPartsTable[(int)partSize];
for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
{
uint32_t puabsPartIdx = ctu->getPUOffset(puIdx, absPartIdx) + absPartIdx;
if (puIdx) depthBytes++;
interDataCTU->mergeFlag[depthBytes] = ctu->m_mergeFlag[puabsPartIdx];
- if (m_param->analysisRefineLevel == 10)
+ if (m_param->analysisReuseLevel == 10)
{
interDataCTU->interDir[depthBytes] = ctu->m_interDir[puabsPartIdx];
for (uint32_t dir = 0; dir < numDir; dir++)
@@ -3004,12 +3326,12 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD
}
}
}
- if (m_param->analysisRefineLevel == 10 && bIntraInInter)
+ if (m_param->analysisReuseLevel == 10 && bIntraInInter)
intraDataCTU->chromaModes[depthBytes] = ctu->m_chromaIntraDir[absPartIdx];
}
absPartIdx += ctu->m_numPartitions >> (depth * 2);
}
- if (m_param->analysisRefineLevel == 10 && bIntraInInter)
+ if (m_param->analysisReuseLevel == 10 && bIntraInInter)
memcpy(&intraDataCTU->modes[ctu->m_cuAddr * ctu->m_numPartitions], ctu->m_lumaIntraDir, sizeof(uint8_t)* ctu->m_numPartitions);
}
}
@@ -3020,10 +3342,10 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD
{
/* Add sizeof depth, modes, partSize, mergeFlag */
analysis->frameRecordSize += depthBytes * 2;
- if (m_param->analysisRefineLevel > 4)
+ if (m_param->analysisReuseLevel > 4)
analysis->frameRecordSize += (depthBytes * 2);
- if (m_param->analysisRefineLevel == 10)
+ if (m_param->analysisReuseLevel == 10)
{
/* Add Size of interDir, mvpIdx, refIdx, mv, luma and chroma modes */
analysis->frameRecordSize += depthBytes;
@@ -3036,7 +3358,12 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD
else
analysis->frameRecordSize += sizeof(int32_t)* analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir;
}
+ analysis->depthBytes = depthBytes;
}
+
+ if (!m_param->bUseAnalysisFile)
+ return;
+
X265_FWRITE(&analysis->frameRecordSize, sizeof(uint32_t), 1, m_analysisFile);
X265_FWRITE(&depthBytes, sizeof(uint32_t), 1, m_analysisFile);
X265_FWRITE(&analysis->poc, sizeof(int), 1, m_analysisFile);
@@ -3048,7 +3375,7 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD
if (analysis->sliceType > X265_TYPE_I)
X265_FWRITE((WeightParam*)analysis->wt, sizeof(WeightParam), numPlanes * numDir, m_analysisFile);
- if (m_param->analysisRefineLevel < 2)
+ if (m_param->analysisReuseLevel < 2)
return;
if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
@@ -3062,11 +3389,11 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD
{
X265_FWRITE(((analysis_inter_data*)analysis->interData)->depth, sizeof(uint8_t), depthBytes, m_analysisFile);
X265_FWRITE(((analysis_inter_data*)analysis->interData)->modes, sizeof(uint8_t), depthBytes, m_analysisFile);
- if (m_param->analysisRefineLevel > 4)
+ if (m_param->analysisReuseLevel > 4)
{
X265_FWRITE(((analysis_inter_data*)analysis->interData)->partSize, sizeof(uint8_t), depthBytes, m_analysisFile);
X265_FWRITE(((analysis_inter_data*)analysis->interData)->mergeFlag, sizeof(uint8_t), depthBytes, m_analysisFile);
- if (m_param->analysisRefineLevel == 10)
+ if (m_param->analysisReuseLevel == 10)
{
X265_FWRITE(((analysis_inter_data*)analysis->interData)->interDir, sizeof(uint8_t), depthBytes, m_analysisFile);
if (bIntraInInter) X265_FWRITE(((analysis_intra_data*)analysis->intraData)->chromaModes, sizeof(uint8_t), depthBytes, m_analysisFile);
@@ -3080,7 +3407,7 @@ void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncD
X265_FWRITE(((analysis_intra_data*)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile);
}
}
- if (m_param->analysisRefineLevel != 10)
+ if (m_param->analysisReuseLevel != 10)
X265_FWRITE(((analysis_inter_data*)analysis->interData)->ref, sizeof(int32_t), analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFile);
}
@@ -3099,8 +3426,8 @@ void Encoder::writeAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, FrameDa
}\
uint32_t depthBytes = 0;
- uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
- uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
uint32_t numCUsInFrame = widthInCU * heightInCU;
analysis2PassFrameData* analysisFrameData = (analysis2PassFrameData*)analysis2Pass->analysisFramedata;
diff --git a/source/encoder/encoder.h b/source/encoder/encoder.h
index 659d977..d456a89 100644
--- a/source/encoder/encoder.h
+++ b/source/encoder/encoder.h
@@ -31,11 +31,9 @@
#include "x265.h"
#include "nal.h"
#include "framedata.h"
-
-#ifdef ENABLE_DYNAMIC_HDR10
- #include "dynamicHDR10\hdr10plus.h"
+#ifdef ENABLE_HDR10_PLUS
+ #include "dynamicHDR10/hdr10plus.h"
#endif
-
struct x265_encoder {};
namespace X265_NS {
// private namespace
@@ -178,8 +176,10 @@ public:
int m_bToneMap; // Enables tone-mapping
-#ifdef ENABLE_DYNAMIC_HDR10
+#ifdef ENABLE_HDR10_PLUS
const hdr10plus_api *m_hdr10plus_api;
+ uint8_t **cim;
+ int numCimInfo;
#endif
x265_sei_payload m_prevTonemapPayload;
@@ -187,7 +187,7 @@ public:
Encoder();
~Encoder()
{
-#ifdef ENABLE_DYNAMIC_HDR10
+#ifdef ENABLE_HDR10_PLUS
if (m_prevTonemapPayload.payload != NULL)
X265_FREE(m_prevTonemapPayload.payload);
#endif
@@ -201,6 +201,8 @@ public:
int reconfigureParam(x265_param* encParam, x265_param* param);
+ void copyCtuInfo(x265_ctu_info_t** frameCtuInfo, int poc);
+
void getStreamHeaders(NALList& list, Entropy& sbacCoder, Bitstream& bs);
void fetchStats(x265_stats* stats, size_t statsSizeBytes);
@@ -223,7 +225,7 @@ public:
void freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType);
- void readAnalysisFile(x265_analysis_data* analysis, int poc);
+ void readAnalysisFile(x265_analysis_data* analysis, int poc, const x265_picture* picIn);
void writeAnalysisFile(x265_analysis_data* pic, FrameData &curEncData);
void readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int poc, int sliceType);
diff --git a/source/encoder/entropy.cpp b/source/encoder/entropy.cpp
index 190365b..ba591d0 100644
--- a/source/encoder/entropy.cpp
+++ b/source/encoder/entropy.cpp
@@ -700,7 +700,7 @@ void Entropy::codeSliceHeader(const Slice& slice, FrameData& encData, uint32_t s
// TODO: Enable when pps_loop_filter_across_slices_enabled_flag==1
// We didn't support filter across slice board, so disable it now
- if (g_maxSlices <= 1)
+ if (encData.m_param->maxSlices <= 1)
{
bool isSAOEnabled = slice.m_sps->bUseSAO ? saoParam->bSaoFlag[0] || saoParam->bSaoFlag[1] : false;
bool isDBFEnabled = !slice.m_pps->bPicDisableDeblockingFilter;
@@ -783,7 +783,7 @@ void Entropy::encodeCU(const CUData& ctu, const CUGeom& cuGeom, uint32_t absPart
if (cuSplitFlag)
codeSplitFlag(ctu, absPartIdx, depth);
- if (depth < ctu.m_cuDepth[absPartIdx] && depth < g_maxCUDepth)
+ if (depth < ctu.m_cuDepth[absPartIdx] && depth < ctu.m_encData->m_param->maxCUDepth)
{
uint32_t qNumParts = cuGeom.numPartitions >> 2;
if (depth == slice->m_pps->maxCuDQPDepth && slice->m_pps->bUseDQP)
@@ -863,7 +863,7 @@ uint32_t Entropy::bitsInterMode(const CUData& cu, uint32_t absPartIdx, uint32_t
case SIZE_nRx2N:
bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]);
bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]);
- if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
+ if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
bits += bitsCodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]);
if (cu.m_slice->m_sps->maxAMPDepth > depth)
{
@@ -888,7 +888,7 @@ void Entropy::finishCU(const CUData& ctu, uint32_t absPartIdx, uint32_t depth, b
uint32_t cuAddr = ctu.getSCUAddr() + absPartIdx;
X265_CHECK(realEndAddress == slice->realEndAddress(slice->m_endCUAddr), "real end address expected\n");
- uint32_t granularityMask = g_maxCUSize - 1;
+ uint32_t granularityMask = ctu.m_encData->m_param->maxCUSize - 1;
uint32_t cuSize = 1 << ctu.m_log2CUSize[absPartIdx];
uint32_t rpelx = ctu.m_cuPelX + g_zscanToPelX[absPartIdx] + cuSize;
uint32_t bpely = ctu.m_cuPelY + g_zscanToPelY[absPartIdx] + cuSize;
@@ -902,7 +902,7 @@ void Entropy::finishCU(const CUData& ctu, uint32_t absPartIdx, uint32_t depth, b
{
// Encode slice finish
uint32_t bTerminateSlice = ctu.m_bLastCuInSlice;
- if (cuAddr + (NUM_4x4_PARTITIONS >> (depth << 1)) == realEndAddress)
+ if (cuAddr + (slice->m_param->num4x4Partitions >> (depth << 1)) == realEndAddress)
bTerminateSlice = 1;
// The 1-terminating bit is added to all streams, so don't add it here when it's 1.
@@ -1512,7 +1512,7 @@ void Entropy::codePartSize(const CUData& cu, uint32_t absPartIdx, uint32_t depth
if (cu.isIntra(absPartIdx))
{
- if (depth == g_maxCUDepth)
+ if (depth == cu.m_encData->m_param->maxCUDepth)
encodeBin(partSize == SIZE_2Nx2N ? 1 : 0, m_contextState[OFF_PART_SIZE_CTX]);
return;
}
@@ -1541,7 +1541,7 @@ void Entropy::codePartSize(const CUData& cu, uint32_t absPartIdx, uint32_t depth
case SIZE_nRx2N:
encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]);
encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]);
- if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
+ if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
encodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]);
if (cu.m_slice->m_sps->maxAMPDepth > depth)
{
diff --git a/source/encoder/frameencoder.cpp b/source/encoder/frameencoder.cpp
index 3d04f9a..f354fbe 100644
--- a/source/encoder/frameencoder.cpp
+++ b/source/encoder/frameencoder.cpp
@@ -124,7 +124,7 @@ bool FrameEncoder::init(Encoder *top, int numRows, int numCols)
range += !!(m_param->searchMethod < 2); /* diamond/hex range check lag */
range += NTAPS_LUMA / 2; /* subpel filter half-length */
range += 2 + (MotionEstimate::hpelIterationCount(m_param->subpelRefine) + 1) / 2; /* subpel refine steps */
- m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + g_maxCUSize - 1) / g_maxCUSize);
+ m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + m_param->maxCUSize - 1) / m_param->maxCUSize);
// NOTE: 2 times of numRows because both Encoder and Filter in same queue
if (!WaveFront::init(m_numRows * 2))
@@ -295,6 +295,11 @@ void FrameEncoder::threadMain()
while (m_threadActive)
{
+ if (m_param->bCTUInfo)
+ {
+ while (!m_frame->m_ctuInfo)
+ m_frame->m_copied.wait();
+ }
compressFrame();
m_done.trigger(); /* FrameEncoder::getEncodedPicture() blocks for this event */
m_enable.wait();
@@ -383,7 +388,7 @@ void FrameEncoder::compressFrame()
bool bUseWeightB = slice->m_sliceType == B_SLICE && slice->m_pps->bUseWeightedBiPred;
WeightParam* reuseWP = NULL;
- if (m_param->analysisMode && (bUseWeightP || bUseWeightB))
+ if (m_param->analysisReuseMode && (bUseWeightP || bUseWeightB))
reuseWP = (WeightParam*)m_frame->m_analysisData.wt;
if (bUseWeightP || bUseWeightB)
@@ -392,7 +397,7 @@ void FrameEncoder::compressFrame()
m_cuStats.countWeightAnalyze++;
ScopedElapsedTime time(m_cuStats.weightAnalyzeTime);
#endif
- if (m_param->analysisMode == X265_ANALYSIS_LOAD)
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
{
for (int list = 0; list < slice->isInterB() + 1; list++)
{
@@ -431,7 +436,7 @@ void FrameEncoder::compressFrame()
slice->m_refReconPicList[l][ref] = slice->m_refFrameList[l][ref]->m_reconPic;
m_mref[l][ref].init(slice->m_refReconPicList[l][ref], w, *m_param);
}
- if (m_param->analysisMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB))
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB))
{
for (int i = 0; i < (m_param->internalCsp != X265_CSP_I400 ? 3 : 1); i++)
*(reuseWP++) = slice->m_weightPredTable[l][0][i];
@@ -664,7 +669,7 @@ void FrameEncoder::compressFrame()
if (writeSei)
{
SEICreativeIntentMeta sei;
- sei.cim = payload->payload;
+ sei.m_payload = payload->payload;
m_bs.resetBits();
sei.setSize(payload->payloadSize);
sei.write(m_bs, *slice->m_sps);
@@ -832,7 +837,7 @@ void FrameEncoder::compressFrame()
}
else if (m_param->decodedPictureHashSEI == 3)
{
- uint32_t cuHeight = g_maxCUSize;
+ uint32_t cuHeight = m_param->maxCUSize;
m_checksum[0] = 0;
@@ -872,43 +877,52 @@ void FrameEncoder::compressFrame()
m_frame->m_encData->m_frameStats.percent8x8Inter = (double)totalP / totalCuCount;
m_frame->m_encData->m_frameStats.percent8x8Skip = (double)totalSkip / totalCuCount;
}
- for (uint32_t i = 0; i < m_numRows; i++)
+
+ if (m_param->csvLogLevel >= 1)
{
- m_frame->m_encData->m_frameStats.cntIntraNxN += m_rows[i].rowStats.cntIntraNxN;
- m_frame->m_encData->m_frameStats.totalCu += m_rows[i].rowStats.totalCu;
- m_frame->m_encData->m_frameStats.totalCtu += m_rows[i].rowStats.totalCtu;
- m_frame->m_encData->m_frameStats.lumaDistortion += m_rows[i].rowStats.lumaDistortion;
- m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
- m_frame->m_encData->m_frameStats.psyEnergy += m_rows[i].rowStats.psyEnergy;
- m_frame->m_encData->m_frameStats.ssimEnergy += m_rows[i].rowStats.ssimEnergy;
- m_frame->m_encData->m_frameStats.resEnergy += m_rows[i].rowStats.resEnergy;
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+ for (uint32_t i = 0; i < m_numRows; i++)
{
- m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth];
- m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth];
- for (int m = 0; m < INTER_MODES; m++)
- m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m];
+ m_frame->m_encData->m_frameStats.cntIntraNxN += m_rows[i].rowStats.cntIntraNxN;
+ m_frame->m_encData->m_frameStats.totalCu += m_rows[i].rowStats.totalCu;
+ m_frame->m_encData->m_frameStats.totalCtu += m_rows[i].rowStats.totalCtu;
+ m_frame->m_encData->m_frameStats.lumaDistortion += m_rows[i].rowStats.lumaDistortion;
+ m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
+ m_frame->m_encData->m_frameStats.psyEnergy += m_rows[i].rowStats.psyEnergy;
+ m_frame->m_encData->m_frameStats.ssimEnergy += m_rows[i].rowStats.ssimEnergy;
+ m_frame->m_encData->m_frameStats.resEnergy += m_rows[i].rowStats.resEnergy;
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
+ {
+ m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth];
+ m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth];
+ for (int m = 0; m < INTER_MODES; m++)
+ m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m];
+ for (int n = 0; n < INTRA_MODES; n++)
+ m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n];
+ }
+ }
+ m_frame->m_encData->m_frameStats.percentIntraNxN = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
+
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
+ {
+ m_frame->m_encData->m_frameStats.percentSkipCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+ m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
for (int n = 0; n < INTRA_MODES; n++)
- m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n];
+ m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+ uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts
+ cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2];
+ m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+ m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu;
+ m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu;
}
}
- m_frame->m_encData->m_frameStats.avgLumaDistortion = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
- m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
- m_frame->m_encData->m_frameStats.avgPsyEnergy = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
- m_frame->m_encData->m_frameStats.avgSsimEnergy = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
- m_frame->m_encData->m_frameStats.avgResEnergy = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
- m_frame->m_encData->m_frameStats.percentIntraNxN = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+
+ if (m_param->csvLogLevel >= 2)
{
- m_frame->m_encData->m_frameStats.percentSkipCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
- m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
- for (int n = 0; n < INTRA_MODES; n++)
- m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu;
- uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts
- cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2];
- m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu;
- m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu;
- m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+ m_frame->m_encData->m_frameStats.avgLumaDistortion = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
+ m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
+ m_frame->m_encData->m_frameStats.avgPsyEnergy = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
+ m_frame->m_encData->m_frameStats.avgSsimEnergy = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
+ m_frame->m_encData->m_frameStats.avgResEnergy = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
}
m_bs.resetBits();
@@ -1096,7 +1110,7 @@ void FrameEncoder::compressFrame()
/* Accumulate CU statistics from each worker thread, we could report
* per-frame stats here, but currently we do not. */
for (int i = 0; i < numTLD; i++)
- m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId]);
+ m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId], *m_param);
#endif
m_endFrameTime = x265_mdate();
@@ -1106,7 +1120,7 @@ void FrameEncoder::encodeSlice(uint32_t sliceAddr)
{
Slice* slice = m_frame->m_encData->m_slice;
const uint32_t widthInLCUs = slice->m_sps->numCuInWidth;
- const uint32_t lastCUAddr = (slice->m_endCUAddr + NUM_4x4_PARTITIONS - 1) / NUM_4x4_PARTITIONS;
+ const uint32_t lastCUAddr = (slice->m_endCUAddr + m_param->num4x4Partitions - 1) / m_param->num4x4Partitions;
const uint32_t numSubstreams = m_param->bEnableWavefront ? slice->m_sps->numCuInHeight : 1;
SAOParam* saoParam = slice->m_sps->bUseSAO ? m_frame->m_encData->m_saoParam : NULL;
@@ -1208,7 +1222,6 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
const uint32_t row = (uint32_t)intRow;
CTURow& curRow = m_rows[row];
- tld.analysis.m_param = m_param;
if (m_param->bEnableWavefront)
{
ScopedLock self(curRow.lock);
@@ -1241,7 +1254,7 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
uint32_t maxBlockCols = (m_frame->m_fencPic->m_picWidth + (16 - 1)) / 16;
uint32_t maxBlockRows = (m_frame->m_fencPic->m_picHeight + (16 - 1)) / 16;
- uint32_t noOfBlocks = g_maxCUSize / 16;
+ uint32_t noOfBlocks = m_param->maxCUSize / 16;
const uint32_t bFirstRowInSlice = ((row == 0) || (m_rows[row - 1].sliceId != curRow.sliceId)) ? 1 : 0;
const uint32_t bLastRowInSlice = ((row == m_numRows - 1) || (m_rows[row + 1].sliceId != curRow.sliceId)) ? 1 : 0;
const uint32_t sliceId = curRow.sliceId;
@@ -1320,8 +1333,8 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
// TODO: specially case handle on first and last row
// Initialize restrict on MV range in slices
- tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * g_maxCUSize * 4) + 3 * 4;
- tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (g_maxCUSize * 4) - 4 * 4);
+ tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * m_param->maxCUSize * 4) + 3 * 4;
+ tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (m_param->maxCUSize * 4) - 4 * 4);
// Handle single row slice
if (tld.analysis.m_sliceMaxY < tld.analysis.m_sliceMinY)
@@ -1361,8 +1374,8 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
cuStat.baseQp = curEncData.m_rowStat[row].rowQp;
/* TODO: use defines from slicetype.h for lowres block size */
- uint32_t block_y = (ctu->m_cuPelY >> g_maxLog2CUSize) * noOfBlocks;
- uint32_t block_x = (ctu->m_cuPelX >> g_maxLog2CUSize) * noOfBlocks;
+ uint32_t block_y = (ctu->m_cuPelY >> m_param->maxLog2CUSize) * noOfBlocks;
+ uint32_t block_x = (ctu->m_cuPelX >> m_param->maxLog2CUSize) * noOfBlocks;
cuStat.vbvCost = 0;
cuStat.intraVbvCost = 0;
@@ -1473,11 +1486,11 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
curRow.rowStats.coeffBits += best.coeffBits;
curRow.rowStats.miscBits += best.totalBits - (best.mvBits + best.coeffBits);
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
{
/* 1 << shift == number of 8x8 blocks at current depth */
- int shift = 2 * (g_maxCUDepth - depth);
- int cuSize = g_maxCUSize >> depth;
+ int shift = 2 * (m_param->maxCUDepth - depth);
+ int cuSize = m_param->maxCUSize >> depth;
if (cuSize == 8)
curRow.rowStats.intra8x8Cnt += (int)(frameLog.cntIntra[depth] + frameLog.cntIntraNxN);
@@ -1496,7 +1509,7 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
curRow.rowStats.resEnergy += best.resEnergy;
curRow.rowStats.cntIntraNxN += frameLog.cntIntraNxN;
curRow.rowStats.totalCu += frameLog.totalCu;
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
{
curRow.rowStats.cntSkipCu[depth] += frameLog.cntSkipCu[depth];
curRow.rowStats.cntMergeCu[depth] += frameLog.cntMergeCu[depth];
@@ -1510,14 +1523,17 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
x265_emms();
if (bIsVbv)
- {
- // Update encoded bits, satdCost, baseQP for each CU
- curEncData.m_rowStat[row].rowSatd += curEncData.m_cuStat[cuAddr].vbvCost;
- curEncData.m_rowStat[row].rowIntraSatd += curEncData.m_cuStat[cuAddr].intraVbvCost;
- curEncData.m_rowStat[row].encodedBits += curEncData.m_cuStat[cuAddr].totalBits;
- curEncData.m_rowStat[row].sumQpRc += curEncData.m_cuStat[cuAddr].baseQp;
- curEncData.m_rowStat[row].numEncodedCUs = cuAddr;
-
+ {
+ // Update encoded bits, satdCost, baseQP for each CU if tune grain is disabled
+ if ((m_param->bEnableWavefront && (!cuAddr || !m_param->rc.bEnableConstVbv)) || !m_param->bEnableWavefront)
+ {
+ curEncData.m_rowStat[row].rowSatd += curEncData.m_cuStat[cuAddr].vbvCost;
+ curEncData.m_rowStat[row].rowIntraSatd += curEncData.m_cuStat[cuAddr].intraVbvCost;
+ curEncData.m_rowStat[row].encodedBits += curEncData.m_cuStat[cuAddr].totalBits;
+ curEncData.m_rowStat[row].sumQpRc += curEncData.m_cuStat[cuAddr].baseQp;
+ curEncData.m_rowStat[row].numEncodedCUs = cuAddr;
+ }
+
// If current block is at row end checkpoint, call vbv ratecontrol.
if (!m_param->bEnableWavefront && col == numCols - 1)
@@ -1553,6 +1569,24 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
else if (m_param->bEnableWavefront && row == col && row)
{
+ if (m_param->rc.bEnableConstVbv)
+ {
+ int32_t startCuAddr = numCols * row;
+ int32_t EndCuAddr = startCuAddr + col;
+ for (int32_t r = row; r >= 0; r--)
+ {
+ for (int32_t c = startCuAddr; c <= EndCuAddr && c <= (int32_t)numCols * (r + 1) - 1; c++)
+ {
+ curEncData.m_rowStat[r].rowSatd += curEncData.m_cuStat[c].vbvCost;
+ curEncData.m_rowStat[r].rowIntraSatd += curEncData.m_cuStat[c].intraVbvCost;
+ curEncData.m_rowStat[r].encodedBits += curEncData.m_cuStat[c].totalBits;
+ curEncData.m_rowStat[r].sumQpRc += curEncData.m_cuStat[c].baseQp;
+ curEncData.m_rowStat[r].numEncodedCUs = c;
+ }
+ startCuAddr = EndCuAddr - numCols;
+ EndCuAddr = startCuAddr + 1;
+ }
+ }
double qpBase = curEncData.m_cuStat[cuAddr].baseQp;
int reEncode = m_top->m_rateControl->rowVbvRateControl(m_frame, row, &m_rce, qpBase);
qpBase = x265_clip3((double)m_param->rc.qpMin, (double)m_param->rc.qpMax, qpBase);
@@ -1648,6 +1682,23 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
}
/** this row of CTUs has been compressed **/
+ if (m_param->bEnableWavefront && m_param->rc.bEnableConstVbv)
+ {
+ if (row == m_numRows - 1)
+ {
+ for (int32_t r = 0; r < (int32_t)m_numRows; r++)
+ {
+ for (int32_t c = curEncData.m_rowStat[r].numEncodedCUs + 1; c < (int32_t)numCols * (r + 1); c++)
+ {
+ curEncData.m_rowStat[r].rowSatd += curEncData.m_cuStat[c].vbvCost;
+ curEncData.m_rowStat[r].rowIntraSatd += curEncData.m_cuStat[c].intraVbvCost;
+ curEncData.m_rowStat[r].encodedBits += curEncData.m_cuStat[c].totalBits;
+ curEncData.m_rowStat[r].sumQpRc += curEncData.m_cuStat[c].baseQp;
+ curEncData.m_rowStat[r].numEncodedCUs = c;
+ }
+ }
+ }
+ }
/* If encoding with ABR, update update bits and complexity in rate control
* after a number of rows so the next frame's rateControlStart has more
@@ -1729,7 +1780,6 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
}
}
- tld.analysis.m_param = NULL;
curRow.busy = false;
// CHECK_ME: Does it always FALSE condition?
@@ -1741,73 +1791,36 @@ void FrameEncoder::processRowEncoder(int intRow, ThreadLocalData& tld)
int FrameEncoder::collectCTUStatistics(const CUData& ctu, FrameStats* log)
{
int totQP = 0;
- if (ctu.m_slice->m_sliceType == I_SLICE)
+ uint32_t depth = 0;
+ for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
{
- uint32_t depth = 0;
- for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
- {
- depth = ctu.m_cuDepth[absPartIdx];
-
- log->totalCu++;
- log->cntIntra[depth]++;
- totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
-
- if (ctu.m_predMode[absPartIdx] == MODE_NONE)
- {
- log->totalCu--;
- log->cntIntra[depth]--;
- }
- else if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
- {
- /* TODO: log intra modes at absPartIdx +0 to +3 */
- X265_CHECK(ctu.m_log2CUSize[absPartIdx] == 3 && ctu.m_slice->m_sps->quadtreeTULog2MinSize < 3, "Intra NxN found at improbable depth\n");
- log->cntIntraNxN++;
- log->cntIntra[depth]--;
- }
- else if (ctu.m_lumaIntraDir[absPartIdx] > 1)
- log->cuIntraDistribution[depth][ANGULAR_MODE_ID]++;
- else
- log->cuIntraDistribution[depth][ctu.m_lumaIntraDir[absPartIdx]]++;
- }
+ depth = ctu.m_cuDepth[absPartIdx];
+ totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
}
- else
+
+ if (m_param->csvLogLevel >= 1 || m_param->rc.bStatWrite)
{
- uint32_t depth = 0;
- for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
+ if (ctu.m_slice->m_sliceType == I_SLICE)
{
- depth = ctu.m_cuDepth[absPartIdx];
-
- log->totalCu++;
- totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
-
- if (ctu.m_predMode[absPartIdx] == MODE_NONE)
- log->totalCu--;
- else if (ctu.isSkipped(absPartIdx))
+ depth = 0;
+ for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
{
- if (ctu.m_mergeFlag[0])
- log->cntMergeCu[depth]++;
- else
- log->cntSkipCu[depth]++;
- }
- else if (ctu.isInter(absPartIdx))
- {
- log->cntInter[depth]++;
+ depth = ctu.m_cuDepth[absPartIdx];
- if (ctu.m_partSize[absPartIdx] < AMP_ID)
- log->cuInterDistribution[depth][ctu.m_partSize[absPartIdx]]++;
- else
- log->cuInterDistribution[depth][AMP_ID]++;
- }
- else if (ctu.isIntra(absPartIdx))
- {
+ log->totalCu++;
log->cntIntra[depth]++;
- if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
+ if (ctu.m_predMode[absPartIdx] == MODE_NONE)
+ {
+ log->totalCu--;
+ log->cntIntra[depth]--;
+ }
+ else if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
{
+ /* TODO: log intra modes at absPartIdx +0 to +3 */
X265_CHECK(ctu.m_log2CUSize[absPartIdx] == 3 && ctu.m_slice->m_sps->quadtreeTULog2MinSize < 3, "Intra NxN found at improbable depth\n");
log->cntIntraNxN++;
log->cntIntra[depth]--;
- /* TODO: log intra modes at absPartIdx +0 to +3 */
}
else if (ctu.m_lumaIntraDir[absPartIdx] > 1)
log->cuIntraDistribution[depth][ANGULAR_MODE_ID]++;
@@ -1815,6 +1828,51 @@ int FrameEncoder::collectCTUStatistics(const CUData& ctu, FrameStats* log)
log->cuIntraDistribution[depth][ctu.m_lumaIntraDir[absPartIdx]]++;
}
}
+ else
+ {
+ depth = 0;
+ for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
+ {
+ depth = ctu.m_cuDepth[absPartIdx];
+
+ log->totalCu++;
+
+ if (ctu.m_predMode[absPartIdx] == MODE_NONE)
+ log->totalCu--;
+ else if (ctu.isSkipped(absPartIdx))
+ {
+ if (ctu.m_mergeFlag[0])
+ log->cntMergeCu[depth]++;
+ else
+ log->cntSkipCu[depth]++;
+ }
+ else if (ctu.isInter(absPartIdx))
+ {
+ log->cntInter[depth]++;
+
+ if (ctu.m_partSize[absPartIdx] < AMP_ID)
+ log->cuInterDistribution[depth][ctu.m_partSize[absPartIdx]]++;
+ else
+ log->cuInterDistribution[depth][AMP_ID]++;
+ }
+ else if (ctu.isIntra(absPartIdx))
+ {
+ log->cntIntra[depth]++;
+
+ if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
+ {
+ X265_CHECK(ctu.m_log2CUSize[absPartIdx] == 3 && ctu.m_slice->m_sps->quadtreeTULog2MinSize < 3, "Intra NxN found at improbable depth\n");
+ log->cntIntraNxN++;
+ log->cntIntra[depth]--;
+ /* TODO: log intra modes at absPartIdx +0 to +3 */
+ }
+ else if (ctu.m_lumaIntraDir[absPartIdx] > 1)
+ log->cuIntraDistribution[depth][ANGULAR_MODE_ID]++;
+ else
+ log->cuIntraDistribution[depth][ctu.m_lumaIntraDir[absPartIdx]]++;
+ }
+ }
+ }
}
return totQP;
diff --git a/source/encoder/framefilter.cpp b/source/encoder/framefilter.cpp
index d685f27..37605e1 100644
--- a/source/encoder/framefilter.cpp
+++ b/source/encoder/framefilter.cpp
@@ -35,107 +35,126 @@ using namespace X265_NS;
static uint64_t computeSSD(pixel *fenc, pixel *rec, intptr_t stride, uint32_t width, uint32_t height);
static float calculateSSIM(pixel *pix1, intptr_t stride1, pixel *pix2, intptr_t stride2, uint32_t width, uint32_t height, void *buf, uint32_t& cnt);
-static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride)
+namespace X265_NS
{
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3];
- for (int16_t x = 0; x < stride - 4; x++)
+ static void integral_init4h_c(uint32_t *sum, pixel *pix, intptr_t stride)
{
- sum[x] = v + sum[x - stride];
- v += pix[x + 4] - pix[x];
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3];
+ for (int16_t x = 0; x < stride - 4; x++)
+ {
+ sum[x] = v + sum[x - stride];
+ v += pix[x + 4] - pix[x];
+ }
}
-}
-static void integral_init8h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7];
- for (int16_t x = 0; x < stride - 8; x++)
+ static void integral_init8h_c(uint32_t *sum, pixel *pix, intptr_t stride)
{
- sum[x] = v + sum[x - stride];
- v += pix[x + 8] - pix[x];
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7];
+ for (int16_t x = 0; x < stride - 8; x++)
+ {
+ sum[x] = v + sum[x - stride];
+ v += pix[x + 8] - pix[x];
+ }
}
-}
-static void integral_init12h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
- pix[8] + pix[9] + pix[10] + pix[11];
- for (int16_t x = 0; x < stride - 12; x++)
+ static void integral_init12h_c(uint32_t *sum, pixel *pix, intptr_t stride)
{
- sum[x] = v + sum[x - stride];
- v += pix[x + 12] - pix[x];
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+ pix[8] + pix[9] + pix[10] + pix[11];
+ for (int16_t x = 0; x < stride - 12; x++)
+ {
+ sum[x] = v + sum[x - stride];
+ v += pix[x + 12] - pix[x];
+ }
}
-}
-static void integral_init16h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
- pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15];
- for (int16_t x = 0; x < stride - 16; x++)
+ static void integral_init16h_c(uint32_t *sum, pixel *pix, intptr_t stride)
{
- sum[x] = v + sum[x - stride];
- v += pix[x + 16] - pix[x];
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+ pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15];
+ for (int16_t x = 0; x < stride - 16; x++)
+ {
+ sum[x] = v + sum[x - stride];
+ v += pix[x + 16] - pix[x];
+ }
}
-}
-static void integral_init24h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
- pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
- pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23];
- for (int16_t x = 0; x < stride - 24; x++)
+ static void integral_init24h_c(uint32_t *sum, pixel *pix, intptr_t stride)
{
- sum[x] = v + sum[x - stride];
- v += pix[x + 24] - pix[x];
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+ pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
+ pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23];
+ for (int16_t x = 0; x < stride - 24; x++)
+ {
+ sum[x] = v + sum[x - stride];
+ v += pix[x + 24] - pix[x];
+ }
}
-}
-static void integral_init32h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
- pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
- pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] +
- pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31];
- for (int16_t x = 0; x < stride - 32; x++)
+ static void integral_init32h_c(uint32_t *sum, pixel *pix, intptr_t stride)
{
- sum[x] = v + sum[x - stride];
- v += pix[x + 32] - pix[x];
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+ pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
+ pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] +
+ pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31];
+ for (int16_t x = 0; x < stride - 32; x++)
+ {
+ sum[x] = v + sum[x - stride];
+ v += pix[x + 32] - pix[x];
+ }
}
-}
-static void integral_init4v(uint32_t *sum4, intptr_t stride)
-{
- for (int x = 0; x < stride; x++)
- sum4[x] = sum4[x + 4 * stride] - sum4[x];
-}
+ static void integral_init4v_c(uint32_t *sum4, intptr_t stride)
+ {
+ for (int x = 0; x < stride; x++)
+ sum4[x] = sum4[x + 4 * stride] - sum4[x];
+ }
-static void integral_init8v(uint32_t *sum8, intptr_t stride)
-{
- for (int x = 0; x < stride; x++)
- sum8[x] = sum8[x + 8 * stride] - sum8[x];
-}
+ static void integral_init8v_c(uint32_t *sum8, intptr_t stride)
+ {
+ for (int x = 0; x < stride; x++)
+ sum8[x] = sum8[x + 8 * stride] - sum8[x];
+ }
-static void integral_init12v(uint32_t *sum12, intptr_t stride)
-{
- for (int x = 0; x < stride; x++)
- sum12[x] = sum12[x + 12 * stride] - sum12[x];
-}
+ static void integral_init12v_c(uint32_t *sum12, intptr_t stride)
+ {
+ for (int x = 0; x < stride; x++)
+ sum12[x] = sum12[x + 12 * stride] - sum12[x];
+ }
-static void integral_init16v(uint32_t *sum16, intptr_t stride)
-{
- for (int x = 0; x < stride; x++)
- sum16[x] = sum16[x + 16 * stride] - sum16[x];
-}
+ static void integral_init16v_c(uint32_t *sum16, intptr_t stride)
+ {
+ for (int x = 0; x < stride; x++)
+ sum16[x] = sum16[x + 16 * stride] - sum16[x];
+ }
-static void integral_init24v(uint32_t *sum24, intptr_t stride)
-{
- for (int x = 0; x < stride; x++)
- sum24[x] = sum24[x + 24 * stride] - sum24[x];
-}
+ static void integral_init24v_c(uint32_t *sum24, intptr_t stride)
+ {
+ for (int x = 0; x < stride; x++)
+ sum24[x] = sum24[x + 24 * stride] - sum24[x];
+ }
-static void integral_init32v(uint32_t *sum32, intptr_t stride)
-{
- for (int x = 0; x < stride; x++)
- sum32[x] = sum32[x + 32 * stride] - sum32[x];
+ static void integral_init32v_c(uint32_t *sum32, intptr_t stride)
+ {
+ for (int x = 0; x < stride; x++)
+ sum32[x] = sum32[x + 32 * stride] - sum32[x];
+ }
+
+ void setupSeaIntegralPrimitives_c(EncoderPrimitives &p)
+ {
+ p.integral_initv[INTEGRAL_4] = integral_init4v_c;
+ p.integral_initv[INTEGRAL_8] = integral_init8v_c;
+ p.integral_initv[INTEGRAL_12] = integral_init12v_c;
+ p.integral_initv[INTEGRAL_16] = integral_init16v_c;
+ p.integral_initv[INTEGRAL_24] = integral_init24v_c;
+ p.integral_initv[INTEGRAL_32] = integral_init32v_c;
+ p.integral_inith[INTEGRAL_4] = integral_init4h_c;
+ p.integral_inith[INTEGRAL_8] = integral_init8h_c;
+ p.integral_inith[INTEGRAL_12] = integral_init12h_c;
+ p.integral_inith[INTEGRAL_16] = integral_init16h_c;
+ p.integral_inith[INTEGRAL_24] = integral_init24h_c;
+ p.integral_inith[INTEGRAL_32] = integral_init32h_c;
+ }
}
void FrameFilter::destroy()
@@ -166,8 +185,8 @@ void FrameFilter::init(Encoder *top, FrameEncoder *frame, int numRows, uint32_t
m_pad[0] = top->m_sps.conformanceWindow.rightOffset;
m_pad[1] = top->m_sps.conformanceWindow.bottomOffset;
m_saoRowDelay = m_param->bEnableLoopFilter ? 1 : 0;
- m_lastHeight = (m_param->sourceHeight % g_maxCUSize) ? (m_param->sourceHeight % g_maxCUSize) : g_maxCUSize;
- m_lastWidth = (m_param->sourceWidth % g_maxCUSize) ? (m_param->sourceWidth % g_maxCUSize) : g_maxCUSize;
+ m_lastHeight = (m_param->sourceHeight % m_param->maxCUSize) ? (m_param->sourceHeight % m_param->maxCUSize) : m_param->maxCUSize;
+ m_lastWidth = (m_param->sourceWidth % m_param->maxCUSize) ? (m_param->sourceWidth % m_param->maxCUSize) : m_param->maxCUSize;
integralCompleted.set(0);
if (m_param->bEnableSsim)
@@ -195,7 +214,7 @@ void FrameFilter::init(Encoder *top, FrameEncoder *frame, int numRows, uint32_t
for(int row = 0; row < numRows; row++)
{
// Setting maximum bound information
- m_parallelFilter[row].m_rowHeight = (row == numRows - 1) ? m_lastHeight : g_maxCUSize;
+ m_parallelFilter[row].m_rowHeight = (row == numRows - 1) ? m_lastHeight : m_param->maxCUSize;
m_parallelFilter[row].m_row = row;
m_parallelFilter[row].m_rowAddr = row * numCols;
m_parallelFilter[row].m_frameFilter = this;
@@ -281,7 +300,7 @@ static void origCUSampleRestoration(const CUData* cu, const CUGeom& cuGeom, Fram
void FrameFilter::ParallelFilter::copySaoAboveRef(const CUData *ctu, PicYuv* reconPic, uint32_t cuAddr, int col)
{
// Copy SAO Top Reference Pixels
- int ctuWidth = g_maxCUSize;
+ int ctuWidth = ctu->m_encData->m_param->maxCUSize;
const pixel* recY = reconPic->getPlaneAddr(0, cuAddr) - (ctu->m_bFirstRowInSlice ? 0 : reconPic->m_stride);
// Luma
@@ -682,8 +701,8 @@ void FrameFilter::processPostRow(int row)
intptr_t stride2 = m_frame->m_fencPic->m_stride;
uint32_t bEnd = ((row) == (this->m_numRows - 1));
uint32_t bStart = (row == 0);
- uint32_t minPixY = row * g_maxCUSize - 4 * !bStart;
- uint32_t maxPixY = X265_MIN((row + 1) * g_maxCUSize - 4 * !bEnd, (uint32_t)m_param->sourceHeight);
+ uint32_t minPixY = row * m_param->maxCUSize - 4 * !bStart;
+ uint32_t maxPixY = X265_MIN((row + 1) * m_param->maxCUSize - 4 * !bEnd, (uint32_t)m_param->sourceHeight);
uint32_t ssim_cnt;
x265_emms();
@@ -749,7 +768,7 @@ void FrameFilter::processPostRow(int row)
uint32_t width = reconPic->m_picWidth;
uint32_t height = m_parallelFilter[row].getCUHeight();
intptr_t stride = reconPic->m_stride;
- uint32_t cuHeight = g_maxCUSize;
+ uint32_t cuHeight = m_param->maxCUSize;
if (!row)
m_frameEncoder->m_checksum[0] = 0;
@@ -793,18 +812,18 @@ void FrameFilter::computeMEIntegral(int row)
}
int stride = (int)m_frame->m_reconPic->m_stride;
- int padX = g_maxCUSize + 32;
- int padY = g_maxCUSize + 16;
+ int padX = m_param->maxCUSize + 32;
+ int padY = m_param->maxCUSize + 16;
int numCuInHeight = m_frame->m_encData->m_slice->m_sps->numCuInHeight;
- int maxHeight = numCuInHeight * g_maxCUSize;
+ int maxHeight = numCuInHeight * m_param->maxCUSize;
int startRow = 0;
if (m_param->interlaceMode)
- startRow = (row * g_maxCUSize >> 1);
+ startRow = (row * m_param->maxCUSize >> 1);
else
- startRow = row * g_maxCUSize;
+ startRow = row * m_param->maxCUSize;
- int height = lastRow ? (maxHeight + g_maxCUSize * m_param->interlaceMode) : (((row + m_param->interlaceMode) * g_maxCUSize) + g_maxCUSize);
+ int height = lastRow ? (maxHeight + m_param->maxCUSize * m_param->interlaceMode) : (((row + m_param->interlaceMode) * m_param->maxCUSize) + m_param->maxCUSize);
if (!row)
{
@@ -833,47 +852,47 @@ void FrameFilter::computeMEIntegral(int row)
uint32_t *sum4x4 = m_frame->m_encData->m_meIntegral[11] + (y + 1) * stride - padX;
/*For width = 32 */
- integral_init32h(sum32x32, pix, stride);
+ primitives.integral_inith[INTEGRAL_32](sum32x32, pix, stride);
if (y >= 32 - padY)
- integral_init32v(sum32x32 - 32 * stride, stride);
- integral_init32h(sum32x24, pix, stride);
+ primitives.integral_initv[INTEGRAL_32](sum32x32 - 32 * stride, stride);
+ primitives.integral_inith[INTEGRAL_32](sum32x24, pix, stride);
if (y >= 24 - padY)
- integral_init24v(sum32x24 - 24 * stride, stride);
- integral_init32h(sum32x8, pix, stride);
+ primitives.integral_initv[INTEGRAL_24](sum32x24 - 24 * stride, stride);
+ primitives.integral_inith[INTEGRAL_32](sum32x8, pix, stride);
if (y >= 8 - padY)
- integral_init8v(sum32x8 - 8 * stride, stride);
+ primitives.integral_initv[INTEGRAL_8](sum32x8 - 8 * stride, stride);
/*For width = 24 */
- integral_init24h(sum24x32, pix, stride);
+ primitives.integral_inith[INTEGRAL_24](sum24x32, pix, stride);
if (y >= 32 - padY)
- integral_init32v(sum24x32 - 32 * stride, stride);
+ primitives.integral_initv[INTEGRAL_32](sum24x32 - 32 * stride, stride);
/*For width = 16 */
- integral_init16h(sum16x16, pix, stride);
+ primitives.integral_inith[INTEGRAL_16](sum16x16, pix, stride);
if (y >= 16 - padY)
- integral_init16v(sum16x16 - 16 * stride, stride);
- integral_init16h(sum16x12, pix, stride);
+ primitives.integral_initv[INTEGRAL_16](sum16x16 - 16 * stride, stride);
+ primitives.integral_inith[INTEGRAL_16](sum16x12, pix, stride);
if (y >= 12 - padY)
- integral_init12v(sum16x12 - 12 * stride, stride);
- integral_init16h(sum16x4, pix, stride);
+ primitives.integral_initv[INTEGRAL_12](sum16x12 - 12 * stride, stride);
+ primitives.integral_inith[INTEGRAL_16](sum16x4, pix, stride);
if (y >= 4 - padY)
- integral_init4v(sum16x4 - 4 * stride, stride);
+ primitives.integral_initv[INTEGRAL_4](sum16x4 - 4 * stride, stride);
/*For width = 12 */
- integral_init12h(sum12x16, pix, stride);
+ primitives.integral_inith[INTEGRAL_12](sum12x16, pix, stride);
if (y >= 16 - padY)
- integral_init16v(sum12x16 - 16 * stride, stride);
+ primitives.integral_initv[INTEGRAL_16](sum12x16 - 16 * stride, stride);
/*For width = 8 */
- integral_init8h(sum8x32, pix, stride);
+ primitives.integral_inith[INTEGRAL_8](sum8x32, pix, stride);
if (y >= 32 - padY)
- integral_init32v(sum8x32 - 32 * stride, stride);
- integral_init8h(sum8x8, pix, stride);
+ primitives.integral_initv[INTEGRAL_32](sum8x32 - 32 * stride, stride);
+ primitives.integral_inith[INTEGRAL_8](sum8x8, pix, stride);
if (y >= 8 - padY)
- integral_init8v(sum8x8 - 8 * stride, stride);
+ primitives.integral_initv[INTEGRAL_8](sum8x8 - 8 * stride, stride);
/*For width = 4 */
- integral_init4h(sum4x16, pix, stride);
+ primitives.integral_inith[INTEGRAL_4](sum4x16, pix, stride);
if (y >= 16 - padY)
- integral_init16v(sum4x16 - 16 * stride, stride);
- integral_init4h(sum4x4, pix, stride);
+ primitives.integral_initv[INTEGRAL_16](sum4x16 - 16 * stride, stride);
+ primitives.integral_inith[INTEGRAL_4](sum4x4, pix, stride);
if (y >= 4 - padY)
- integral_init4v(sum4x4 - 4 * stride, stride);
+ primitives.integral_initv[INTEGRAL_4](sum4x4 - 4 * stride, stride);
}
m_parallelFilter[row].m_frameFilter->integralCompleted.set(1);
}
diff --git a/source/encoder/framefilter.h b/source/encoder/framefilter.h
index 1bbcabb..19a6d64 100644
--- a/source/encoder/framefilter.h
+++ b/source/encoder/framefilter.h
@@ -123,7 +123,7 @@ public:
uint32_t getCUWidth(int colNum) const
{
- return (colNum == (int)m_numCols - 1) ? m_lastWidth : g_maxCUSize;
+ return (colNum == (int)m_numCols - 1) ? m_lastWidth : m_param->maxCUSize;
}
void init(Encoder *top, FrameEncoder *frame, int numRows, uint32_t numCols);
diff --git a/source/encoder/motion.cpp b/source/encoder/motion.cpp
index fba2419..0cd7de5 100644
--- a/source/encoder/motion.cpp
+++ b/source/encoder/motion.cpp
@@ -598,6 +598,139 @@ void MotionEstimate::StarPatternSearch(ReferencePlanes *ref,
}
}
+void MotionEstimate::refineMV(ReferencePlanes* ref,
+ const MV& mvmin,
+ const MV& mvmax,
+ const MV& qmvp,
+ MV& outQMv)
+{
+ ALIGN_VAR_16(int, costs[16]);
+ if (ctuAddr >= 0)
+ blockOffset = ref->reconPic->getLumaAddr(ctuAddr, absPartIdx) - ref->reconPic->getLumaAddr(0);
+ intptr_t stride = ref->lumaStride;
+ pixel* fenc = fencPUYuv.m_buf[0];
+ pixel* fref = ref->fpelPlane[0] + blockOffset;
+
+ setMVP(qmvp);
+
+ MV qmvmin = mvmin.toQPel();
+ MV qmvmax = mvmax.toQPel();
+
+ /* The term cost used here means satd/sad values for that particular search.
+ * The costs used in ME integer search only includes the SAD cost of motion
+ * residual and sqrtLambda times MVD bits. The subpel refine steps use SATD
+ * cost of residual and sqrtLambda * MVD bits.
+ */
+
+ // measure SATD cost at clipped QPEL MVP
+ MV pmv = qmvp.clipped(qmvmin, qmvmax);
+ MV bestpre = pmv;
+ int bprecost;
+
+ bprecost = subpelCompare(ref, pmv, sad);
+
+ /* re-measure full pel rounded MVP with SAD as search start point */
+ MV bmv = pmv.roundToFPel();
+ int bcost = bprecost;
+ if (pmv.isSubpel())
+ bcost = sad(fenc, FENC_STRIDE, fref + bmv.x + bmv.y * stride, stride) + mvcost(bmv << 2);
+
+ /* square refine */
+ int dir = 0;
+ COST_MV_X4_DIR(0, -1, 0, 1, -1, 0, 1, 0, costs);
+ if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
+ COPY2_IF_LT(bcost, costs[0], dir, 1);
+ if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
+ COPY2_IF_LT(bcost, costs[1], dir, 2);
+ COPY2_IF_LT(bcost, costs[2], dir, 3);
+ COPY2_IF_LT(bcost, costs[3], dir, 4);
+ COST_MV_X4_DIR(-1, -1, -1, 1, 1, -1, 1, 1, costs);
+ if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
+ COPY2_IF_LT(bcost, costs[0], dir, 5);
+ if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
+ COPY2_IF_LT(bcost, costs[1], dir, 6);
+ if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
+ COPY2_IF_LT(bcost, costs[2], dir, 7);
+ if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
+ COPY2_IF_LT(bcost, costs[3], dir, 8);
+ bmv += square1[dir];
+
+ if (bprecost < bcost)
+ {
+ bmv = bestpre;
+ bcost = bprecost;
+ }
+ else
+ bmv = bmv.toQPel(); // promote search bmv to qpel
+
+ // TO DO: Change SubpelWorkload to fine tune MV
+ // Now it is set to 5 for experiment.
+ // const SubpelWorkload& wl = workload[this->subpelRefine];
+ const SubpelWorkload& wl = workload[5];
+
+ pixelcmp_t hpelcomp;
+
+ if (wl.hpel_satd)
+ {
+ bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
+ hpelcomp = satd;
+ }
+ else
+ hpelcomp = sad;
+
+ for (int iter = 0; iter < wl.hpel_iters; iter++)
+ {
+ int bdir = 0;
+ for (int i = 1; i <= wl.hpel_dirs; i++)
+ {
+ MV qmv = bmv + square1[i] * 2;
+
+ // check mv range for slice bound
+ if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
+ continue;
+
+ int cost = subpelCompare(ref, qmv, hpelcomp) + mvcost(qmv);
+ COPY2_IF_LT(bcost, cost, bdir, i);
+ }
+
+ if (bdir)
+ bmv += square1[bdir] * 2;
+ else
+ break;
+ }
+
+ /* if HPEL search used SAD, remeasure with SATD before QPEL */
+ if (!wl.hpel_satd)
+ bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
+
+ for (int iter = 0; iter < wl.qpel_iters; iter++)
+ {
+ int bdir = 0;
+ for (int i = 1; i <= wl.qpel_dirs; i++)
+ {
+ MV qmv = bmv + square1[i];
+
+ // check mv range for slice bound
+ if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
+ continue;
+
+ int cost = subpelCompare(ref, qmv, satd) + mvcost(qmv);
+ COPY2_IF_LT(bcost, cost, bdir, i);
+ }
+
+ if (bdir)
+ bmv += square1[bdir];
+ else
+ break;
+ }
+
+ // check mv range for slice bound
+ X265_CHECK(((pmv.y >= qmvmin.y) & (pmv.y <= qmvmax.y)), "mv beyond range!");
+
+ x265_emms();
+ outQMv = bmv;
+}
+
int MotionEstimate::motionEstimate(ReferencePlanes *ref,
const MV & mvmin,
const MV & mvmax,
@@ -606,6 +739,7 @@ int MotionEstimate::motionEstimate(ReferencePlanes *ref,
const MV * mvc,
int merange,
MV & outQMv,
+ uint32_t maxSlices,
pixel * srcReferencePlane)
{
ALIGN_VAR_16(int, costs[16]);
@@ -1306,7 +1440,7 @@ me_hex2:
const SubpelWorkload& wl = workload[this->subpelRefine];
// check mv range for slice bound
- if ((g_maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
+ if ((maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
{
bmv.y = x265_min(x265_max(bmv.y, qmvmin.y), qmvmax.y);
bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
diff --git a/source/encoder/motion.h b/source/encoder/motion.h
index 866b977..7d3653e 100644
--- a/source/encoder/motion.h
+++ b/source/encoder/motion.h
@@ -92,7 +92,8 @@ public:
chromaSatd(refYuv.getCrAddr(puPartIdx), refYuv.m_csize, fencPUYuv.m_buf[2], fencPUYuv.m_csize);
}
- int motionEstimate(ReferencePlanes* ref, const MV & mvmin, const MV & mvmax, const MV & qmvp, int numCandidates, const MV * mvc, int merange, MV & outQMv, pixel *srcReferencePlane = 0);
+ void refineMV(ReferencePlanes* ref, const MV& mvmin, const MV& mvmax, const MV& qmvp, MV& outQMv);
+ int motionEstimate(ReferencePlanes* ref, const MV & mvmin, const MV & mvmax, const MV & qmvp, int numCandidates, const MV * mvc, int merange, MV & outQMv, uint32_t maxSlices, pixel *srcReferencePlane = 0);
int subpelCompare(ReferencePlanes* ref, const MV &qmv, pixelcmp_t);
diff --git a/source/encoder/ratecontrol.cpp b/source/encoder/ratecontrol.cpp
index c6346d7..77c66cf 100644
--- a/source/encoder/ratecontrol.cpp
+++ b/source/encoder/ratecontrol.cpp
@@ -2272,7 +2272,7 @@ double RateControl::predictRowsSizeSum(Frame* curFrame, RateControlEntry* rce, d
uint32_t refRowSatdCost = 0, refRowBits = 0, intraCostForPendingCus = 0;
double refQScale = 0;
- if (picType != I_SLICE)
+ if (picType != I_SLICE && !m_param->rc.bEnableConstVbv)
{
FrameData& refEncData = *refFrame->m_encData;
uint32_t endCuAddr = maxCols * (row + 1);
@@ -2301,7 +2301,8 @@ double RateControl::predictRowsSizeSum(Frame* curFrame, RateControlEntry* rce, d
&& refFrame
&& refFrame->m_encData->m_slice->m_sliceType == picType
&& refQScale > 0
- && refRowSatdCost > 0)
+ && refRowBits > 0
+ && !m_param->rc.bEnableConstVbv)
{
if (abs((int32_t)(refRowSatdCost - satdCostForPendingCus)) < (int32_t)satdCostForPendingCus / 2)
{
@@ -2343,7 +2344,7 @@ int RateControl::rowVbvRateControl(Frame* curFrame, uint32_t row, RateControlEnt
}
rowSatdCost >>= X265_DEPTH - 8;
updatePredictor(rce->rowPred[0], qScaleVbv, (double)rowSatdCost, encodedBits);
- if (curEncData.m_slice->m_sliceType != I_SLICE)
+ if (curEncData.m_slice->m_sliceType != I_SLICE && !m_param->rc.bEnableConstVbv)
{
Frame* refFrame = curEncData.m_slice->m_refFrameList[0][0];
if (qpVbv < refFrame->m_encData->m_rowStat[row].rowQp)
@@ -2613,7 +2614,7 @@ int RateControl::rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry*
for (uint32_t i = 0; i < slice->m_sps->numCuInHeight; i++)
avgQpAq += curEncData.m_rowStat[i].sumQpAq;
- avgQpAq /= (slice->m_sps->numCUsInFrame * NUM_4x4_PARTITIONS);
+ avgQpAq /= (slice->m_sps->numCUsInFrame * m_param->num4x4Partitions);
curEncData.m_avgQpAq = avgQpAq;
}
else
@@ -2711,6 +2712,13 @@ int RateControl::rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry*
{
*filler = updateVbv(actualBits, rce);
+ curFrame->m_rcData->bufferFillFinal = m_bufferFillFinal;
+ for (int i = 0; i < 4; i++)
+ {
+ curFrame->m_rcData->coeff[i] = m_pred[i].coeff;
+ curFrame->m_rcData->count[i] = m_pred[i].count;
+ curFrame->m_rcData->offset[i] = m_pred[i].offset;
+ }
if (m_param->bEmitHRDSEI)
{
const VUI *vui = &curEncData.m_slice->m_sps->vuiParameters;
diff --git a/source/encoder/reference.cpp b/source/encoder/reference.cpp
index e843061..f99a179 100644
--- a/source/encoder/reference.cpp
+++ b/source/encoder/reference.cpp
@@ -72,12 +72,12 @@ int MotionReference::init(PicYuv* recPic, WeightParam *wp, const x265_param& p)
if (wp)
{
- uint32_t numCUinHeight = (reconPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
+ uint32_t numCUinHeight = (reconPic->m_picHeight + p.maxCUSize - 1) / p.maxCUSize;
int marginX = reconPic->m_lumaMarginX;
int marginY = reconPic->m_lumaMarginY;
intptr_t stride = reconPic->m_stride;
- int cuHeight = g_maxCUSize;
+ int cuHeight = p.maxCUSize;
for (int c = 0; c < (p.internalCsp != X265_CSP_I400 && recPic->m_picCsp != X265_CSP_I400 ? numInterpPlanes : 1); c++)
{
@@ -127,15 +127,15 @@ void MotionReference::applyWeight(uint32_t finishedRows, uint32_t maxNumRows, ui
int marginY = reconPic->m_lumaMarginY;
intptr_t stride = reconPic->m_stride;
int width = reconPic->m_picWidth;
- int height = (finishedRows - numWeightedRows) * g_maxCUSize;
+ int height = (finishedRows - numWeightedRows) * reconPic->m_param->maxCUSize;
/* the last row may be partial height */
if (finishedRows == maxNumRows - 1)
{
- const int leftRows = (reconPic->m_picHeight & (g_maxCUSize - 1));
+ const int leftRows = (reconPic->m_picHeight & (reconPic->m_param->maxCUSize - 1));
- height += leftRows ? leftRows : g_maxCUSize;
+ height += leftRows ? leftRows : reconPic->m_param->maxCUSize;
}
- int cuHeight = g_maxCUSize;
+ int cuHeight = reconPic->m_param->maxCUSize;
for (int c = 0; c < numInterpPlanes; c++)
{
diff --git a/source/encoder/sao.cpp b/source/encoder/sao.cpp
index 2530bb8..a74db48 100644
--- a/source/encoder/sao.cpp
+++ b/source/encoder/sao.cpp
@@ -98,8 +98,8 @@ bool SAO::create(x265_param* param, int initCommon)
m_hChromaShift = CHROMA_H_SHIFT(param->internalCsp);
m_vChromaShift = CHROMA_V_SHIFT(param->internalCsp);
- m_numCuInWidth = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
- m_numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
+ m_numCuInWidth = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
+ m_numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
const pixel maxY = (1 << X265_DEPTH) - 1;
const pixel rangeExt = maxY >> 1;
@@ -107,12 +107,12 @@ bool SAO::create(x265_param* param, int initCommon)
for (int i = 0; i < (param->internalCsp != X265_CSP_I400 ? 3 : 1); i++)
{
- CHECKED_MALLOC(m_tmpL1[i], pixel, g_maxCUSize + 1);
- CHECKED_MALLOC(m_tmpL2[i], pixel, g_maxCUSize + 1);
+ CHECKED_MALLOC(m_tmpL1[i], pixel, m_param->maxCUSize + 1);
+ CHECKED_MALLOC(m_tmpL2[i], pixel, m_param->maxCUSize + 1);
// SAO asm code will read 1 pixel before and after, so pad by 2
// NOTE: m_param->sourceWidth+2 enough, to avoid condition check in copySaoAboveRef(), I alloc more up to 63 bytes in here
- CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * g_maxCUSize + 2 + 32);
+ CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * m_param->maxCUSize + 2 + 32);
m_tmpU[i] += 1;
}
@@ -279,8 +279,8 @@ void SAO::applyPixelOffsets(int addr, int typeIdx, int plane)
uint32_t picWidth = m_param->sourceWidth;
uint32_t picHeight = m_param->sourceHeight;
const CUData* cu = m_frame->m_encData->getPicCTU(addr);
- int ctuWidth = g_maxCUSize;
- int ctuHeight = g_maxCUSize;
+ int ctuWidth = m_param->maxCUSize;
+ int ctuHeight = m_param->maxCUSize;
uint32_t lpelx = cu->m_cuPelX;
uint32_t tpely = cu->m_cuPelY;
const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
@@ -573,8 +573,8 @@ void SAO::generateLumaOffsets(SaoCtuParam* ctuParam, int idxY, int idxX)
{
PicYuv* reconPic = m_frame->m_reconPic;
intptr_t stride = reconPic->m_stride;
- int ctuWidth = g_maxCUSize;
- int ctuHeight = g_maxCUSize;
+ int ctuWidth = m_param->maxCUSize;
+ int ctuHeight = m_param->maxCUSize;
int addr = idxY * m_numCuInWidth + idxX;
pixel* rec = reconPic->getLumaAddr(addr);
@@ -633,8 +633,8 @@ void SAO::generateChromaOffsets(SaoCtuParam* ctuParam[3], int idxY, int idxX)
{
PicYuv* reconPic = m_frame->m_reconPic;
intptr_t stride = reconPic->m_strideC;
- int ctuWidth = g_maxCUSize;
- int ctuHeight = g_maxCUSize;
+ int ctuWidth = m_param->maxCUSize;
+ int ctuHeight = m_param->maxCUSize;
{
ctuWidth >>= m_hChromaShift;
@@ -744,8 +744,8 @@ void SAO::calcSaoStatsCTU(int addr, int plane)
intptr_t stride = plane ? reconPic->m_strideC : reconPic->m_stride;
uint32_t picWidth = m_param->sourceWidth;
uint32_t picHeight = m_param->sourceHeight;
- int ctuWidth = g_maxCUSize;
- int ctuHeight = g_maxCUSize;
+ int ctuWidth = m_param->maxCUSize;
+ int ctuHeight = m_param->maxCUSize;
uint32_t lpelx = cu->m_cuPelX;
uint32_t tpely = cu->m_cuPelY;
const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
@@ -791,9 +791,9 @@ void SAO::calcSaoStatsCTU(int addr, int plane)
// WARNING: *) May read beyond bound on video than ctuWidth or ctuHeight is NOT multiple of cuSize
X265_CHECK((ctuWidth == ctuHeight) || (m_chromaFormat != X265_CSP_I420), "video size check failure\n");
if (plane)
- primitives.chroma[m_chromaFormat].cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
+ primitives.chroma[m_chromaFormat].cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
else
- primitives.cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
+ primitives.cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
}
else
{
@@ -928,8 +928,8 @@ void SAO::calcSaoStatsCu_BeforeDblk(Frame* frame, int idxX, int idxY)
intptr_t stride = reconPic->m_stride;
uint32_t picWidth = m_param->sourceWidth;
uint32_t picHeight = m_param->sourceHeight;
- int ctuWidth = g_maxCUSize;
- int ctuHeight = g_maxCUSize;
+ int ctuWidth = m_param->maxCUSize;
+ int ctuHeight = m_param->maxCUSize;
uint32_t lpelx = cu->m_cuPelX;
uint32_t tpely = cu->m_cuPelY;
const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
@@ -1553,14 +1553,17 @@ void SAO::saoLumaComponentParamDist(SAOParam* saoParam, int32_t addr, int64_t& r
}
// Estimate Best Position
- int64_t bestRDCostBO = MAX_INT64;
int32_t bestClassBO = 0;
+ int64_t currentRDCost = costClasses[0];
+ currentRDCost += costClasses[1];
+ currentRDCost += costClasses[2];
+ currentRDCost += costClasses[3];
+ int64_t bestRDCostBO = currentRDCost;
- for (int i = 0; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++)
+ for (int i = 1; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++)
{
- int64_t currentRDCost = 0;
- for (int j = i; j < i + SAO_NUM_OFFSET; j++)
- currentRDCost += costClasses[j];
+ currentRDCost -= costClasses[i - 1];
+ currentRDCost += costClasses[i + 3];
if (currentRDCost < bestRDCostBO)
{
diff --git a/source/encoder/search.cpp b/source/encoder/search.cpp
index e5e7ff1..21a0ed8 100644
--- a/source/encoder/search.cpp
+++ b/source/encoder/search.cpp
@@ -120,8 +120,8 @@ bool Search::initSearch(const x265_param& param, ScalingList& scalingList)
CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL + sizeC * 2);
m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[0] + sizeL;
m_rqt[i].coeffRQT[2] = m_rqt[i].coeffRQT[0] + sizeL + sizeC;
- ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp);
- ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp);
+ ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp);
+ ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp);
}
}
else
@@ -130,15 +130,15 @@ bool Search::initSearch(const x265_param& param, ScalingList& scalingList)
{
CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL);
m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[2] = NULL;
- ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp);
- ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp);
+ ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp);
+ ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp);
}
}
/* the rest of these buffers are indexed per-depth */
- for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+ for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
{
- int cuSize = g_maxCUSize >> i;
+ int cuSize = param.maxCUSize >> i;
ok &= m_rqt[i].tmpResiYuv.create(cuSize, param.internalCsp);
ok &= m_rqt[i].tmpPredYuv.create(cuSize, param.internalCsp);
ok &= m_rqt[i].bidirPredYuv[0].create(cuSize, param.internalCsp);
@@ -186,7 +186,7 @@ Search::~Search()
m_rqt[i].resiQtYuv.destroy();
}
- for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+ for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
{
m_rqt[i].tmpResiYuv.destroy();
m_rqt[i].tmpPredYuv.destroy();
@@ -2073,7 +2073,7 @@ void Search::singleMotionEstimation(Search& master, Mode& interMode, const Predi
int mvpIdx = selectMVP(interMode.cu, pu, amvp, list, ref);
MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
- if (!m_param->analysisMode) /* Prevents load/save outputs from diverging if lowresMV is not available */
+ if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging if lowresMV is not available */
{
MV lmv = getLowresMV(interMode.cu, pu, list, ref);
if (lmv.notZero())
@@ -2082,7 +2082,7 @@ void Search::singleMotionEstimation(Search& master, Mode& interMode, const Predi
setSearchRange(interMode.cu, mvp, m_param->searchRange, mvmin, mvmax);
- int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv,
+ int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices,
m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
/* Get total cost of partition, but only include MV bit cost once */
@@ -2108,6 +2108,17 @@ void Search::singleMotionEstimation(Search& master, Mode& interMode, const Predi
}
}
+void Search::searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv)
+{
+ CUData& cu = interMode.cu;
+ const Slice *slice = m_slice;
+ MV mv = cu.m_mv[list][pu.puAbsPartIdx];
+ cu.clipMv(mv);
+ MV mvmin, mvmax;
+ setSearchRange(cu, mv, m_param->searchRange, mvmin, mvmax);
+ m_me.refineMV(&slice->m_mref[list][ref], mvmin, mvmax, mv, outmv);
+}
+
/* find the best inter prediction for each PU of specified mode */
void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t refMasks[2])
{
@@ -2150,7 +2161,7 @@ void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChroma
cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours);
/* Uni-directional prediction */
- if ((m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
+ if ((m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
|| (m_param->analysisMultiPassRefine && m_param->rc.bStatRead))
{
for (int list = 0; list < numPredDir; list++)
@@ -2180,7 +2191,7 @@ void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChroma
if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && mvpIdx == bestME[list].mvpIdx)
mvpIn = bestME[list].mv;
- int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv,
+ int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices,
m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
/* Get total cost of partition, but only include MV bit cost once */
@@ -2286,7 +2297,7 @@ void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChroma
int mvpIdx = selectMVP(cu, pu, amvp, list, ref);
MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
- if (!m_param->analysisMode) /* Prevents load/save outputs from diverging when lowresMV is not available */
+ if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging when lowresMV is not available */
{
MV lmv = getLowresMV(cu, pu, list, ref);
if (lmv.notZero())
@@ -2300,7 +2311,7 @@ void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChroma
m_me.integral[planes] = interMode.fencYuv->m_integral[list][ref][planes] + puX * pu.width + puY * pu.height * m_slice->m_refFrameList[list][ref]->m_reconPic->m_stride;
}
setSearchRange(cu, mvp, m_param->searchRange, mvmin, mvmax);
- int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv,
+ int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices,
m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
/* Get total cost of partition, but only include MV bit cost once */
@@ -2582,11 +2593,11 @@ void Search::setSearchRange(const CUData& cu, const MV& mvp, int merange, MV& mv
cu.clipMv(mvmax);
if (cu.m_encData->m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
- cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirStartCol &&
+ cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirStartCol &&
m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol < m_slice->m_sps->numCuInWidth)
{
int safeX, maxSafeMv;
- safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
+ safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
maxSafeMv = (safeX - cu.m_cuPelX) * 4;
mvmax.x = X265_MIN(mvmax.x, maxSafeMv);
mvmin.x = X265_MIN(mvmin.x, maxSafeMv);
diff --git a/source/encoder/search.h b/source/encoder/search.h
index 2f9805b..f6cc651 100644
--- a/source/encoder/search.h
+++ b/source/encoder/search.h
@@ -204,9 +204,9 @@ struct CUStats
memset(this, 0, sizeof(*this));
}
- void accumulate(CUStats& other)
+ void accumulate(CUStats& other, x265_param& param)
{
- for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+ for (uint32_t i = 0; i <= param.maxCUDepth; i++)
{
intraRDOElapsedTime[i] += other.intraRDOElapsedTime[i];
interRDOElapsedTime[i] += other.interRDOElapsedTime[i];
@@ -311,6 +311,7 @@ public:
// estimation inter prediction (non-skip)
void predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t masks[2]);
+ void searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv);
// encode residual and compute rd-cost for inter mode
void encodeResAndCalcRdInterCU(Mode& interMode, const CUGeom& cuGeom);
void encodeResAndCalcRdSkipCU(Mode& interMode);
diff --git a/source/encoder/sei.cpp b/source/encoder/sei.cpp
index 52bed84..24c9d4a 100644
--- a/source/encoder/sei.cpp
+++ b/source/encoder/sei.cpp
@@ -54,21 +54,23 @@ void SEI::write(Bitstream& bs, const SPS& sps)
}
WRITE_CODE(type, 8, "payload_type");
uint32_t payloadSize;
- if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED)
+ if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED || m_payloadType == USER_DATA_REGISTERED_ITU_T_T35)
{
if (hrdTypes)
{
X265_CHECK(0 == (count.getNumberOfWrittenBits() & 7), "payload unaligned\n");
payloadSize = count.getNumberOfWrittenBits() >> 3;
}
- else
+ else if (m_payloadType == USER_DATA_UNREGISTERED)
payloadSize = m_payloadSize + 16;
+ else
+ payloadSize = m_payloadSize;
for (; payloadSize >= 0xff; payloadSize -= 0xff)
WRITE_CODE(0xff, 8, "payload_size");
WRITE_CODE(payloadSize, 8, "payload_size");
}
- else if(m_payloadType != USER_DATA_REGISTERED_ITU_T_T35)
+ else
WRITE_CODE(m_payloadSize, 8, "payload_size");
/* virtual writeSEI method, write to bs */
writeSEI(sps);
diff --git a/source/encoder/sei.h b/source/encoder/sei.h
index b87688e..ac7a913 100644
--- a/source/encoder/sei.h
+++ b/source/encoder/sei.h
@@ -276,27 +276,17 @@ public:
m_payloadSize = 0;
}
- uint8_t *cim;
+ uint8_t *m_payload;
// daniel.vt at samsung.com :: for the Creative Intent Meta Data Encoding ( seongnam.oh at samsung.com )
void writeSEI(const SPS&)
{
- if (!cim)
+ if (!m_payload)
return;
- int i = 0;
- int payloadSize = m_payloadSize;
- while (cim[i] == 0xFF)
- {
- i++;
- payloadSize += cim[i];
- WRITE_CODE(0xFF, 8, "payload_size");
- }
- WRITE_CODE(payloadSize, 8, "payload_size");
- i++;
- payloadSize += i;
- for (; i < payloadSize; ++i)
- WRITE_CODE(cim[i], 8, "creative_intent_metadata");
+ uint32_t i = 0;
+ for (; i < m_payloadSize; ++i)
+ WRITE_CODE(m_payload[i], 8, "creative_intent_metadata");
}
};
}
diff --git a/source/encoder/slicetype.cpp b/source/encoder/slicetype.cpp
index d3f62f4..d7638a4 100644
--- a/source/encoder/slicetype.cpp
+++ b/source/encoder/slicetype.cpp
@@ -893,7 +893,7 @@ void Lookahead::getEstimatedPictureCost(Frame *curFrame)
if (m_param->rc.cuTree && !m_param->rc.bStatRead)
/* update row satds based on cutree offsets */
curFrame->m_lowres.satdCost = frameCostRecalculate(frames, p0, p1, b);
- else if (m_param->analysisMode != X265_ANALYSIS_LOAD)
+ else if (m_param->analysisReuseMode != X265_ANALYSIS_LOAD || m_param->scaleFactor)
{
if (m_param->rc.aqMode)
curFrame->m_lowres.satdCost = curFrame->m_lowres.costEstAq[b - p0][p1 - b];
@@ -907,7 +907,7 @@ void Lookahead::getEstimatedPictureCost(Frame *curFrame)
curFrame->m_lowres.lowresCostForRc = curFrame->m_lowres.lowresCosts[b - p0][p1 - b];
uint32_t lowresRow = 0, lowresCol = 0, lowresCuIdx = 0, sum = 0, intraSum = 0;
uint32_t scale = m_param->maxCUSize / (2 * X265_LOWRES_CU_SIZE);
- uint32_t numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
+ uint32_t numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
uint32_t widthInLowresCu = (uint32_t)m_8x8Width, heightInLowresCu = (uint32_t)m_8x8Height;
double *qp_offset = 0;
/* Factor in qpoffsets based on Aq/Cutree in CU costs */
@@ -1638,6 +1638,13 @@ bool Lookahead::scenecut(Lowres **frames, int p0, int p1, bool bRealScenecut, in
m_isSceneTransition = false; /* Signal end of scene transitioning */
}
+ if (m_param->csvLogLevel >= 2)
+ {
+ int64_t icost = frames[p1]->costEst[0][0];
+ int64_t pcost = frames[p1]->costEst[p1 - p0][0];
+ frames[p1]->ipCostRatio = (double)icost / pcost;
+ }
+
/* A frame is always analysed with bRealScenecut = true first, and then bRealScenecut = false,
the former for I decisions and the latter for P/B decisions. It's possible that the first
analysis detected scenecuts which were later nulled due to scene transitioning, in which
@@ -1812,7 +1819,8 @@ void Lookahead::calcMotionAdaptiveQuantFrame(Lowres **frames, int p0, int p1, in
MV *mvs = frames[b]->lowresMvs[list][listDist[list]];
int32_t x = mvs[cuIndex].x;
int32_t y = mvs[cuIndex].y;
- displacement += sqrt(pow(abs(x), 2) + pow(abs(y), 2));
+ // NOTE: the dynamic range of abs(x) and abs(y) is 15-bits
+ displacement += sqrt((double)(abs(x) * abs(x)) + (double)(abs(y) * abs(y)));
}
else
displacement += 0.0;
@@ -2400,7 +2408,7 @@ void CostEstimateGroup::estimateCUCost(LookaheadTLD& tld, int cuX, int cuY, int
/* ME will never return a cost larger than the cost @MVP, so we do not
* have to check that ME cost is more than the estimated merge cost */
- fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV);
+ fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV, m_lookahead.m_param->maxSlices);
if (skipCost < 64 && skipCost < fencCost && bBidir)
{
fencCost = skipCost;
diff --git a/source/test/ipfilterharness.cpp b/source/test/ipfilterharness.cpp
index 312a878..95ac9c7 100644
--- a/source/test/ipfilterharness.cpp
+++ b/source/test/ipfilterharness.cpp
@@ -38,10 +38,8 @@ IPFilterHarness::IPFilterHarness()
{
pixel_test_buff[0][i] = rand() & PIXEL_MAX;
short_test_buff[0][i] = (rand() % (2 * SMAX)) - SMAX;
-
pixel_test_buff[1][i] = PIXEL_MIN;
- short_test_buff[1][i] = SMIN;
-
+ short_test_buff[1][i] = (int16_t)SMIN;
pixel_test_buff[2][i] = PIXEL_MAX;
short_test_buff[2][i] = SMAX;
}
diff --git a/source/test/ipfilterharness.h b/source/test/ipfilterharness.h
index 3edbd6a..fcf4360 100644
--- a/source/test/ipfilterharness.h
+++ b/source/test/ipfilterharness.h
@@ -39,8 +39,7 @@ protected:
enum { ITERS = 100 };
enum { TEST_CASES = 3 };
enum { SMAX = 1 << 12 };
- enum { SMIN = -1 << 12 };
-
+ enum { SMIN = (unsigned)-1 << 12 };
ALIGN_VAR_32(pixel, pixel_buff[TEST_BUF_SIZE]);
int16_t short_buff[TEST_BUF_SIZE];
int16_t IPF_vec_output_s[TEST_BUF_SIZE];
diff --git a/source/test/pixelharness.cpp b/source/test/pixelharness.cpp
index 8727d2e..4feee58 100644
--- a/source/test/pixelharness.cpp
+++ b/source/test/pixelharness.cpp
@@ -44,9 +44,8 @@ PixelHarness::PixelHarness()
uchar_test_buff[0][i] = rand() % ((1 << 8) - 1);
residual_test_buff[0][i] = (rand() % (2 * RMAX + 1)) - RMAX - 1;// For sse_ss only
double_test_buff[0][i] = (double)(short_test_buff[0][i]) / 256.0;
-
pixel_test_buff[1][i] = PIXEL_MIN;
- short_test_buff[1][i] = SMIN;
+ short_test_buff[1][i] = (int16_t)SMIN;
short_test_buff1[1][i] = PIXEL_MIN;
short_test_buff2[1][i] = -16384;
int_test_buff[1][i] = SHORT_MIN;
@@ -2003,6 +2002,76 @@ bool PixelHarness::check_pelFilterChroma_V(pelFilterChroma_t ref, pelFilterChrom
return true;
}
+bool PixelHarness::check_integral_initv(integralv_t ref, integralv_t opt)
+{
+ intptr_t srcStep = 64;
+ int j = 0;
+ uint32_t dst_ref[BUFFSIZE] = { 0 };
+ uint32_t dst_opt[BUFFSIZE] = { 0 };
+
+ for (int i = 0; i < 64; i++)
+ {
+ dst_ref[i] = pixel_test_buff[0][i];
+ dst_opt[i] = pixel_test_buff[0][i];
+ }
+
+ for (int i = 0, k = 0; i < BUFFSIZE; i++)
+ {
+ if (i % 64 == 0)
+ k++;
+ dst_ref[i] = dst_ref[i % 64] + k;
+ dst_opt[i] = dst_opt[i % 64] + k;
+ }
+
+ int padx = 4;
+ int pady = 4;
+ uint32_t *dst_ref_ptr = dst_ref + srcStep * pady + padx;
+ uint32_t *dst_opt_ptr = dst_opt + srcStep * pady + padx;
+ for (int i = 0; i < ITERS; i++)
+ {
+ ref(dst_ref_ptr, srcStep);
+ checked(opt, dst_opt_ptr, srcStep);
+
+ if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE))
+ return false;
+
+ reportfail()
+ j += INCR;
+ }
+ return true;
+}
+
+bool PixelHarness::check_integral_inith(integralh_t ref, integralh_t opt)
+{
+ /* Since stride is always a multiple of 8 and data movement in AVX2 is 16 elements at a time for 8 bit pixel, we need
+ * to check correctness for two cases: stride multiple of 16 and stride not a multiple of 16; fine for High bit depth
+ * where data movement in AVX2 is 8 elements at a time */
+ intptr_t srcStep[2] = { 56, 64 };
+ int j = 0;
+ uint32_t dst_ref[BUFFSIZE] = { 0 };
+ uint32_t dst_opt[BUFFSIZE] = { 0 };
+
+ int padx = 4;
+ int pady = 4;
+ for (int l = 0; l < 2; l++)
+ {
+ uint32_t *dst_ref_ptr = dst_ref + srcStep[l] * pady + padx;
+ uint32_t *dst_opt_ptr = dst_opt + srcStep[l] * pady + padx;
+ for (int k = 0; k < ITERS; k++)
+ {
+ ref(dst_ref_ptr, pixel_test_buff[0], srcStep[l]);
+ checked(opt, dst_opt_ptr, pixel_test_buff[0], srcStep[l]);
+
+ if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE))
+ return false;
+
+ reportfail()
+ j += INCR;
+ }
+ }
+ return true;
+}
+
bool PixelHarness::testPU(int part, const EncoderPrimitives& ref, const EncoderPrimitives& opt)
{
if (opt.pu[part].satd)
@@ -2688,6 +2757,64 @@ bool PixelHarness::testCorrectness(const EncoderPrimitives& ref, const EncoderPr
}
}
+ for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+ {
+ if (opt.integral_initv[k] && !check_integral_initv(ref.integral_initv[k], opt.integral_initv[k]))
+ {
+ switch (k)
+ {
+ case 0:
+ printf("Integral4v failed!\n");
+ break;
+ case 1:
+ printf("Integral8v failed!\n");
+ break;
+ case 2:
+ printf("Integral12v failed!\n");
+ break;
+ case 3:
+ printf("Integral16v failed!\n");
+ break;
+ case 4:
+ printf("Integral24v failed!\n");
+ break;
+ case 5:
+ printf("Integral32v failed!\n");
+ break;
+ }
+ return false;
+ }
+ }
+
+
+ for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+ {
+ if (opt.integral_inith[k] && !check_integral_inith(ref.integral_inith[k], opt.integral_inith[k]))
+ {
+ switch (k)
+ {
+ case 0:
+ printf("Integral4h failed!\n");
+ break;
+ case 1:
+ printf("Integral8h failed!\n");
+ break;
+ case 2:
+ printf("Integral12h failed!\n");
+ break;
+ case 3:
+ printf("Integral16h failed!\n");
+ break;
+ case 4:
+ printf("Integral24h failed!\n");
+ break;
+ case 5:
+ printf("Integral32h failed!\n");
+ break;
+ }
+ return false;
+ }
+ }
return true;
}
@@ -3210,4 +3337,67 @@ void PixelHarness::measureSpeed(const EncoderPrimitives& ref, const EncoderPrimi
HEADER0("pelFilterChroma_Horizontal");
REPORT_SPEEDUP(opt.pelFilterChroma[1], ref.pelFilterChroma[1], pbuf1, 1, STRIDE, tc, maskP, maskQ);
}
+
+ for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+ {
+ if (opt.integral_initv[k])
+ {
+ switch (k)
+ {
+ case 0:
+ HEADER0("integral_init4v");
+ break;
+ case 1:
+ HEADER0("integral_init8v");
+ break;
+ case 2:
+ HEADER0("integral_init12v");
+ break;
+ case 3:
+ HEADER0("integral_init16v");
+ break;
+ case 4:
+ HEADER0("integral_init24v");
+ break;
+ case 5:
+ HEADER0("integral_init32v");
+ break;
+ default:
+ break;
+ }
+ REPORT_SPEEDUP(opt.integral_initv[k], ref.integral_initv[k], (uint32_t*)pbuf1, STRIDE);
+ }
+ }
+
+ for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+ {
+ if (opt.integral_inith[k])
+ {
+ uint32_t dst_buf[BUFFSIZE] = { 0 };
+ switch (k)
+ {
+ case 0:
+ HEADER0("integral_init4h");
+ break;
+ case 1:
+ HEADER0("integral_init8h");
+ break;
+ case 2:
+ HEADER0("integral_init12h");
+ break;
+ case 3:
+ HEADER0("integral_init16h");
+ break;
+ case 4:
+ HEADER0("integral_init24h");
+ break;
+ case 5:
+ HEADER0("integral_init32h");
+ break;
+ default:
+ break;
+ }
+ REPORT_SPEEDUP(opt.integral_inith[k], ref.integral_inith[k], dst_buf, pbuf1, STRIDE);
+ }
+ }
}
diff --git a/source/test/pixelharness.h b/source/test/pixelharness.h
index e67edb4..08eac39 100644
--- a/source/test/pixelharness.h
+++ b/source/test/pixelharness.h
@@ -40,7 +40,7 @@ protected:
enum { BUFFSIZE = STRIDE * (MAX_HEIGHT + PAD_ROWS) + INCR * ITERS };
enum { TEST_CASES = 3 };
enum { SMAX = 1 << 12 };
- enum { SMIN = -1 << 12 };
+ enum { SMIN = (unsigned)-1 << 12 };
enum { RMAX = PIXEL_MAX - PIXEL_MIN }; //The maximum value obtained by subtracting pixel values (residual max)
enum { RMIN = PIXEL_MIN - PIXEL_MAX }; //The minimum value obtained by subtracting pixel values (residual min)
@@ -126,6 +126,8 @@ protected:
bool check_pelFilterLumaStrong_H(pelFilterLumaStrong_t ref, pelFilterLumaStrong_t opt);
bool check_pelFilterChroma_V(pelFilterChroma_t ref, pelFilterChroma_t opt);
bool check_pelFilterChroma_H(pelFilterChroma_t ref, pelFilterChroma_t opt);
+ bool check_integral_initv(integralv_t ref, integralv_t opt);
+ bool check_integral_inith(integralh_t ref, integralh_t opt);
public:
diff --git a/source/test/regression-tests.txt b/source/test/regression-tests.txt
index 2e3df7c..1e35dc0 100644
--- a/source/test/regression-tests.txt
+++ b/source/test/regression-tests.txt
@@ -17,17 +17,17 @@ BasketballDrive_1920x1080_50.y4m,--preset veryfast --tune zerolatency --no-tempo
BasketballDrive_1920x1080_50.y4m,--preset faster --aq-strength 2 --merange 190 --slices 3
BasketballDrive_1920x1080_50.y4m,--preset medium --ctu 16 --max-tu-size 8 --subme 7 --qg-size 16 --cu-lossless --tu-inter-depth 3 --limit-tu 1
BasketballDrive_1920x1080_50.y4m,--preset medium --keyint -1 --nr-inter 100 -F4 --no-sao
-BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-mode=save --refine-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-mode=load --refine-level 2 --bitrate 7000 --limit-modes
+BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 7000 --limit-modes
BasketballDrive_1920x1080_50.y4m,--preset slow --nr-intra 100 -F4 --aq-strength 3 --qg-size 16 --limit-refs 1
BasketballDrive_1920x1080_50.y4m,--preset slower --lossless --chromaloc 3 --subme 0 --limit-tu 4
-BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-mode=save --refine-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-mode=load --refine-level 10 --bitrate 7000 --limit-tu 0
+BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0
BasketballDrive_1920x1080_50.y4m,--preset veryslow --crf 4 --cu-lossless --pmode --limit-refs 1 --aq-mode 3 --limit-tu 3
-BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-mode=load --bitrate 7000 --tskip-fast --limit-tu 4
+BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-reuse-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-reuse-mode=load --bitrate 7000 --tskip-fast --limit-tu 4
BasketballDrive_1920x1080_50.y4m,--preset veryslow --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
Coastguard-4k.y4m,--preset ultrafast --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
Coastguard-4k.y4m,--preset superfast --tune grain --overscan=crop
Coastguard-4k.y4m,--preset superfast --tune grain --pme --aq-strength 2 --merange 190
-Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 15000
+Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 15000
Coastguard-4k.y4m,--preset medium --rdoq-level 1 --tune ssim --no-signhide --me umh --slices 2
Coastguard-4k.y4m,--preset slow --tune psnr --cbqpoffs -1 --crqpoffs 1 --limit-refs 1
CrowdRun_1920x1080_50_10bit_422.yuv,--preset ultrafast --weightp --tune zerolatency --qg-size 16
@@ -51,7 +51,7 @@ DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset slow --temporal-layers --no-psy
DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset veryfast --weightp --nr-intra 1000 -F4
DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset medium --nr-inter 500 -F4 --no-psy-rdoq
DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset slower --no-weightp --rdoq-level 0 --limit-refs 3 --tu-inter-depth 4 --limit-tu 3
-DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
+DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-reuse-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-reuse-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
FourPeople_1280x720_60.y4m,--preset superfast --no-wpp --lookahead-slices 2
FourPeople_1280x720_60.y4m,--preset veryfast --aq-mode 2 --aq-strength 1.5 --qg-size 8
FourPeople_1280x720_60.y4m,--preset medium --qp 38 --no-psy-rd
@@ -68,8 +68,8 @@ KristenAndSara_1280x720_60.y4m,--preset medium --no-cutree --max-tu-size 16
KristenAndSara_1280x720_60.y4m,--preset slower --pmode --max-tu-size 8 --limit-refs 0 --limit-modes --limit-tu 1
NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset superfast --tune psnr
NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset medium --tune grain --limit-refs 2
-NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-mode=save --rd 5 --refine-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-mode=load --rd 5 --refine-level 10 --bitrate 9000
-News-4k.y4m,--preset ultrafast --no-cutree --analysis-mode=save --refine-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-mode=load --refine-level 2 --bitrate 15000
+NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-reuse-mode=save --rd 5 --analysis-reuse-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-reuse-mode=load --rd 5 --analysis-reuse-level 10 --bitrate 9000
+News-4k.y4m,--preset ultrafast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 15000
News-4k.y4m,--preset superfast --lookahead-slices 6 --aq-mode 0
News-4k.y4m,--preset superfast --slices 4 --aq-mode 0
News-4k.y4m,--preset medium --tune ssim --no-sao --qg-size 16
@@ -123,7 +123,7 @@ old_town_cross_444_720p50.y4m,--preset ultrafast --weightp --min-cu 32
old_town_cross_444_720p50.y4m,--preset superfast --weightp --min-cu 16 --limit-modes
old_town_cross_444_720p50.y4m,--preset veryfast --qp 1 --tune ssim
old_town_cross_444_720p50.y4m,--preset faster --rd 1 --tune zero-latency
-old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 3000 --early-skip
+old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 3000 --early-skip
old_town_cross_444_720p50.y4m,--preset medium --keyint -1 --no-weightp --ref 6
old_town_cross_444_720p50.y4m,--preset slow --rdoq-level 1 --early-skip --ref 7 --no-b-pyramid
old_town_cross_444_720p50.y4m,--preset slower --crf 4 --cu-lossless
diff --git a/source/x265-extras.cpp b/source/x265-extras.cpp
index e488ab6..58cf0d4 100644
--- a/source/x265-extras.cpp
+++ b/source/x265-extras.cpp
@@ -25,7 +25,7 @@
#include "x265.h"
#include "x265-extras.h"
-
+#include "param.h"
#include "common.h"
using namespace X265_NS;
@@ -38,14 +38,8 @@ static const char* summaryCSVHeader =
"B count, B ave-QP, B kbps, B-PSNR Y, B-PSNR U, B-PSNR V, B-SSIM (dB), "
"MaxCLL, MaxFALL, Version\n";
-FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level)
+FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level)
{
- if (sizeof(x265_stats) != api.sizeof_stats || sizeof(x265_picture) != api.sizeof_picture)
- {
- fprintf(stderr, "extras [error]: structure size skew, unable to create CSV logfile\n");
- return NULL;
- }
-
FILE *csvfp = x265_fopen(fname, "r");
if (csvfp)
{
@@ -62,6 +56,8 @@ FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char*
if (level)
{
fprintf(csvfp, "Encode Order, Type, POC, QP, Bits, Scenecut, ");
+ if (level >= 2)
+ fprintf(csvfp, "I/P cost ratio, ");
if (param.rc.rateControlMode == X265_RC_CRF)
fprintf(csvfp, "RateFactor, ");
if (param.rc.vbvBufferSize)
@@ -73,7 +69,7 @@ FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char*
fprintf(csvfp, "Latency, ");
fprintf(csvfp, "List 0, List 1");
uint32_t size = param.maxCUSize;
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
{
fprintf(csvfp, ", Intra %dx%d DC, Intra %dx%d Planar, Intra %dx%d Ang", size, size, size, size, size, size);
size /= 2;
@@ -82,7 +78,7 @@ FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char*
size = param.maxCUSize;
if (param.bEnableRectInter)
{
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
{
fprintf(csvfp, ", Inter %dx%d, Inter %dx%d (Rect)", size, size, size, size);
if (param.bEnableAMP)
@@ -92,29 +88,56 @@ FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char*
}
else
{
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
{
fprintf(csvfp, ", Inter %dx%d", size, size);
size /= 2;
}
}
size = param.maxCUSize;
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
{
fprintf(csvfp, ", Skip %dx%d", size, size);
size /= 2;
}
size = param.maxCUSize;
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
{
fprintf(csvfp, ", Merge %dx%d", size, size);
size /= 2;
}
- fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Luma Level, Max Luma Level, Avg Residual Energy");
- /* detailed performance statistics */
if (level >= 2)
- fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
+ {
+ fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Residual Energy,"
+ " Min Luma Level, Max Luma Level, Avg Luma Level");
+
+ if (param.internalCsp != X265_CSP_I400)
+ fprintf(csvfp, ", Min Cb Level, Max Cb Level, Avg Cb Level, Min Cr Level, Max Cr Level, Avg Cr Level");
+
+ /* PU statistics */
+ size = param.maxCUSize;
+ for (uint32_t i = 0; i< param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++)
+ {
+ fprintf(csvfp, ", Intra %dx%d", size, size);
+ fprintf(csvfp, ", Skip %dx%d", size, size);
+ fprintf(csvfp, ", AMP %d", size);
+ fprintf(csvfp, ", Inter %dx%d", size, size);
+ fprintf(csvfp, ", Merge %dx%d", size, size);
+ fprintf(csvfp, ", Inter %dx%d", size, size / 2);
+ fprintf(csvfp, ", Merge %dx%d", size, size / 2);
+ fprintf(csvfp, ", Inter %dx%d", size / 2, size);
+ fprintf(csvfp, ", Merge %dx%d", size / 2, size);
+ size /= 2;
+ }
+
+ if ((uint32_t)g_log2Size[param.minCUSize] == 3)
+ fprintf(csvfp, ", 4x4");
+
+ /* detailed performance statistics */
+ fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms),"
+ "Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
+ }
fprintf(csvfp, "\n");
}
else
@@ -131,7 +154,10 @@ void x265_csvlog_frame(FILE* csvfp, const x265_param& param, const x265_picture&
return;
const x265_frame_stats* frameStats = &pic.frameData;
- fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc, frameStats->qp, (int)frameStats->bits, frameStats->bScenecut);
+ fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc,
+ frameStats->qp, (int)frameStats->bits, frameStats->bScenecut);
+ if (level >= 2)
+ fprintf(csvfp, "%.2f,", frameStats->ipCostRatio);
if (param.rc.rateControlMode == X265_RC_CRF)
fprintf(csvfp, "%.3lf,", frameStats->rateFactor);
if (param.rc.vbvBufferSize)
@@ -159,39 +185,76 @@ void x265_csvlog_frame(FILE* csvfp, const x265_param& param, const x265_picture&
else
fputs(" -,", csvfp);
}
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
- fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0], frameStats->cuStats.percentIntraDistribution[depth][1], frameStats->cuStats.percentIntraDistribution[depth][2]);
- fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN);
- if (param.bEnableRectInter)
+
+ if (level)
{
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+ fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0],
+ frameStats->cuStats.percentIntraDistribution[depth][1],
+ frameStats->cuStats.percentIntraDistribution[depth][2]);
+ fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN);
+ if (param.bEnableRectInter)
{
- fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0], frameStats->cuStats.percentInterDistribution[depth][1]);
- if (param.bEnableAMP)
- fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]);
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+ {
+ fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0],
+ frameStats->cuStats.percentInterDistribution[depth][1]);
+ if (param.bEnableAMP)
+ fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]);
+ }
}
+ else
+ {
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+ fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]);
+ }
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+ fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]);
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+ fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]);
}
- else
- {
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
- fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]);
- }
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
- fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]);
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
- fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]);
- fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf, %d, %.2lf", frameStats->avgLumaDistortion, frameStats->avgChromaDistortion, frameStats->avgPsyEnergy, frameStats->avgLumaLevel, frameStats->maxLumaLevel, frameStats->avgResEnergy);
if (level >= 2)
{
- fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime, frameStats->wallTime, frameStats->refWaitWallTime, frameStats->totalCTUTime, frameStats->stallTime, frameStats->totalFrameTime);
+ fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf ", frameStats->avgLumaDistortion,
+ frameStats->avgChromaDistortion,
+ frameStats->avgPsyEnergy,
+ frameStats->avgResEnergy);
+
+ fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minLumaLevel, frameStats->maxLumaLevel, frameStats->avgLumaLevel);
+
+ if (param.internalCsp != X265_CSP_I400)
+ {
+ fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaULevel, frameStats->maxChromaULevel, frameStats->avgChromaULevel);
+ fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaVLevel, frameStats->maxChromaVLevel, frameStats->avgChromaVLevel);
+ }
+
+ for (uint32_t i = 0; i < param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++)
+ {
+ fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentIntraPu[i]);
+ fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentSkipPu[i]);
+ fprintf(csvfp, ",%.2lf%%", frameStats->puStats.percentAmpPu[i]);
+ for (uint32_t j = 0; j < 3; j++)
+ {
+ fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentInterPu[i][j]);
+ fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentMergePu[i][j]);
+ }
+ }
+ if ((uint32_t)g_log2Size[param.minCUSize] == 3)
+ fprintf(csvfp, ",%.2lf%%", frameStats->puStats.percentNxN);
+
+ fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime,
+ frameStats->wallTime, frameStats->refWaitWallTime,
+ frameStats->totalCTUTime, frameStats->stallTime,
+ frameStats->totalFrameTime);
+
fprintf(csvfp, " %.3lf, %d", frameStats->avgWPP, frameStats->countRowBlocks);
}
fprintf(csvfp, "\n");
fflush(stderr);
}
-void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, const x265_stats& stats, int level, int argc, char** argv)
+void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, int padx, int pady, const x265_stats& stats, int level, int argc, char** argv)
{
if (!csvfp)
return;
@@ -204,13 +267,27 @@ void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& para
}
// CLI arguments or other
- fputc('"', csvfp);
- for (int i = 1; i < argc; i++)
+ if (argc)
{
- fputc(' ', csvfp);
- fputs(argv[i], csvfp);
+ fputc('"', csvfp);
+ for (int i = 1; i < argc; i++)
+ {
+ fputc(' ', csvfp);
+ fputs(argv[i], csvfp);
+ }
+ fputc('"', csvfp);
+ }
+ else
+ {
+ const x265_param* paramTemp = ¶m;
+ char *opts = x265_param2string((x265_param*)paramTemp, padx, pady);
+ if (opts)
+ {
+ fputc('"', csvfp);
+ fputs(opts, csvfp);
+ fputc('"', csvfp);
+ }
}
- fputc('"', csvfp);
// current date and time
time_t now;
diff --git a/source/x265-extras.h b/source/x265-extras.h
index d4b10eb..5b29345 100644
--- a/source/x265-extras.h
+++ b/source/x265-extras.h
@@ -44,7 +44,7 @@ extern "C" {
* closed by the caller using fclose(). If level is 0, then no frame logging
* header is written to the file. This function will return NULL if it is unable
* to open the file for write or if it detects a structure size skew */
-LIBAPI FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level);
+LIBAPI FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level);
/* Log frame statistics to the CSV file handle. level should have been non-zero
* in the call to x265_csvlog_open() if this function is called. */
@@ -53,7 +53,7 @@ LIBAPI void x265_csvlog_frame(FILE* csvfp, const x265_param& param, const x265_p
/* Log final encode statistics to the CSV file handle. 'argc' and 'argv' are
* intended to be command line arguments passed to the encoder. Encode
* statistics should be queried from the encoder just prior to closing it. */
-LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, const x265_stats& stats, int level, int argc, char** argv);
+LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, int padx, int pady, const x265_stats& stats, int level, int argc, char** argv);
/* In-place downshift from a bit-depth greater than 8 to a bit-depth of 8, using
* the residual bits to dither each row. */
diff --git a/source/x265.cpp b/source/x265.cpp
index 9f61cd7..2703109 100644
--- a/source/x265.cpp
+++ b/source/x265.cpp
@@ -73,15 +73,12 @@ struct CLIOptions
ReconFile* recon;
OutputFile* output;
FILE* qpfile;
- FILE* csvfpt;
- const char* csvfn;
const char* reconPlayCmd;
const x265_api* api;
x265_param* param;
bool bProgress;
bool bForceY4m;
bool bDither;
- int csvLogLevel;
uint32_t seek; // number of frames to skip from the beginning
uint32_t framesToBeEncoded; // number of frames to encode
uint64_t totalbytes;
@@ -97,8 +94,6 @@ struct CLIOptions
recon = NULL;
output = NULL;
qpfile = NULL;
- csvfpt = NULL;
- csvfn = NULL;
reconPlayCmd = NULL;
api = NULL;
param = NULL;
@@ -109,7 +104,6 @@ struct CLIOptions
startTime = x265_mdate();
prevUpdateTime = 0;
bDither = false;
- csvLogLevel = 0;
}
void destroy();
@@ -129,9 +123,6 @@ void CLIOptions::destroy()
if (qpfile)
fclose(qpfile);
qpfile = NULL;
- if (csvfpt)
- fclose(csvfpt);
- csvfpt = NULL;
if (output)
output->release();
output = NULL;
@@ -292,8 +283,6 @@ bool CLIOptions::parse(int argc, char **argv)
if (0) ;
OPT2("frame-skip", "seek") this->seek = (uint32_t)x265_atoi(optarg, bError);
OPT("frames") this->framesToBeEncoded = (uint32_t)x265_atoi(optarg, bError);
- OPT("csv") this->csvfn = optarg;
- OPT("csv-log-level") this->csvLogLevel = x265_atoi(optarg, bError);
OPT("no-progress") this->bProgress = false;
OPT("output") outputfn = optarg;
OPT("input") inputfn = optarg;
@@ -530,8 +519,7 @@ static int get_argv_utf8(int *argc_ptr, char ***argv_ptr)
* 1 - unable to parse command line
* 2 - unable to open encoder
* 3 - unable to generate stream headers
- * 4 - encoder abort
- * 5 - unable to open csv file */
+ * 4 - encoder abort */
int main(int argc, char **argv)
{
@@ -586,28 +574,15 @@ int main(int argc, char **argv)
/* get the encoder parameters post-initialization */
api->encoder_parameters(encoder, param);
- if (cliopt.csvfn)
- {
- cliopt.csvfpt = x265_csvlog_open(*api, *param, cliopt.csvfn, cliopt.csvLogLevel);
- if (!cliopt.csvfpt)
- {
- x265_log_file(param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", cliopt.csvfn);
- cliopt.destroy();
- if (cliopt.api)
- cliopt.api->param_free(cliopt.param);
- exit(5);
- }
- }
-
- /* Control-C handler */
+ /* Control-C handler */
if (signal(SIGINT, sigint_handler) == SIG_ERR)
x265_log(param, X265_LOG_ERROR, "Unable to register CTRL+C handler: %s\n", strerror(errno));
x265_picture pic_orig, pic_out;
x265_picture *pic_in = &pic_orig;
- /* Allocate recon picture if analysisMode is enabled */
+ /* Allocate recon picture if analysisReuseMode is enabled */
std::priority_queue<int64_t>* pts_queue = cliopt.output->needPTS() ? new std::priority_queue<int64_t>() : NULL;
- x265_picture *pic_recon = (cliopt.recon || !!param->analysisMode || pts_queue || reconPlay || cliopt.csvLogLevel) ? &pic_out : NULL;
+ x265_picture *pic_recon = (cliopt.recon || !!param->analysisReuseMode || pts_queue || reconPlay || param->csvLogLevel) ? &pic_out : NULL;
uint32_t inFrameCount = 0;
uint32_t outFrameCount = 0;
x265_nal *p_nal;
@@ -698,8 +673,6 @@ int main(int argc, char **argv)
}
cliopt.printStatus(outFrameCount);
- if (numEncoded && cliopt.csvLogLevel)
- x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel);
}
/* Flush the encoder */
@@ -730,8 +703,6 @@ int main(int argc, char **argv)
}
cliopt.printStatus(outFrameCount);
- if (numEncoded && cliopt.csvLogLevel)
- x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel);
if (!numEncoded)
break;
@@ -746,8 +717,8 @@ fail:
delete reconPlay;
api->encoder_get_stats(encoder, &stats, sizeof(stats));
- if (cliopt.csvfpt && !b_ctrl_c)
- x265_csvlog_encode(cliopt.csvfpt, api->version_str, *param, stats, cliopt.csvLogLevel, argc, argv);
+ if (param->csvfn && !b_ctrl_c)
+ api->encoder_log(encoder, argc, argv);
api->encoder_close(encoder);
int64_t second_largest_pts = 0;
diff --git a/source/x265.h b/source/x265.h
index f2ab68b..a242461 100644
--- a/source/x265.h
+++ b/source/x265.h
@@ -24,10 +24,9 @@
#ifndef X265_H
#define X265_H
-
#include <stdint.h>
+#include <stdio.h>
#include "x265_config.h"
-
#ifdef __cplusplus
extern "C" {
#endif
@@ -98,6 +97,7 @@ typedef struct x265_analysis_data
uint32_t sliceType;
uint32_t numCUsInFrame;
uint32_t numPartitions;
+ uint32_t depthBytes;
int bScenecut;
void* wt;
void* interData;
@@ -117,6 +117,20 @@ typedef struct x265_cu_stats
} x265_cu_stats;
+/* pu statistics */
+typedef struct x265_pu_stats
+{
+ double percentSkipPu[4]; // Percentage of skip cu in all depths
+ double percentIntraPu[4]; // Percentage of intra modes in all depths
+ double percentAmpPu[4]; // Percentage of amp modes in all depths
+ double percentInterPu[4][3]; // Percentage of inter 2nx2n, 2nxn and nx2n in all depths
+ double percentMergePu[4][3]; // Percentage of merge 2nx2n, 2nxn and nx2n in all depth
+ double percentNxN;
+
+ /* All the above values will add up to 100%. */
+} x265_pu_stats;
+
+
typedef struct x265_analysis_2Pass
{
uint32_t poc;
@@ -154,13 +168,41 @@ typedef struct x265_frame_stats
int list0POC[16];
int list1POC[16];
uint16_t maxLumaLevel;
+ uint16_t minLumaLevel;
+
+ uint16_t maxChromaULevel;
+ uint16_t minChromaULevel;
+ double avgChromaULevel;
+
+
+ uint16_t maxChromaVLevel;
+ uint16_t minChromaVLevel;
+ double avgChromaVLevel;
+
char sliceType;
int bScenecut;
+ double ipCostRatio;
int frameLatency;
x265_cu_stats cuStats;
+ x265_pu_stats puStats;
double totalFrameTime;
} x265_frame_stats;
+typedef struct x265_ctu_info_t
+{
+ int32_t ctuAddress;
+ int32_t ctuPartitions[64];
+ void* ctuInfo;
+} x265_ctu_info_t;
+
+typedef enum
+{
+ NO_CTU_INFO = 0,
+ HAS_CTU_INFO = 1,
+ CTU_INFO_CHANGE = 2,
+}CTUInfo;
+
+
/* Arbitrary User SEI
* Payload size is in bytes and the payload pointer must be non-NULL.
* Payload types and syntax can be found in Annex D of the H.265 Specification.
@@ -258,15 +300,15 @@ typedef struct x265_picture
* to allow the encoder to determine base QP */
int forceqp;
- /* If param.analysisMode is X265_ANALYSIS_OFF this field is ignored on input
+ /* If param.analysisReuseMode is X265_ANALYSIS_OFF this field is ignored on input
* and output. Else the user must call x265_alloc_analysis_data() to
* allocate analysis buffers for every picture passed to the encoder.
*
- * On input when param.analysisMode is X265_ANALYSIS_LOAD and analysisData
+ * On input when param.analysisReuseMode is X265_ANALYSIS_LOAD and analysisData
* member pointers are valid, the encoder will use the data stored here to
* reduce encoder work.
*
- * On output when param.analysisMode is X265_ANALYSIS_SAVE and analysisData
+ * On output when param.analysisReuseMode is X265_ANALYSIS_SAVE and analysisData
* member pointers are valid, the encoder will write output analysis into
* this data structure */
x265_analysis_data analysisData;
@@ -612,7 +654,14 @@ typedef struct x265_param
* X265_LOG_FULL, default is X265_LOG_INFO */
int logLevel;
- /* Filename of CSV log. Now deprecated */
+ /* Level of csv logging. 0 is summary, 1 is frame level logging,
+ * 2 is frame level logging with performance statistics */
+ int csvLogLevel;
+
+ /* filename of CSV log. If csvLogLevel is non-zero, the encoder will emit
+ * per-slice statistics to this log file in encode order. Otherwise the
+ * encoder will emit per-stream statistics into the log file when
+ * x265_encoder_log is called (presumably at the end of the encode) */
const char* csvfn;
/*== Internal Picture Specification ==*/
@@ -1057,10 +1106,10 @@ typedef struct x265_param
* buffers. if X265_ANALYSIS_LOAD, read analysis information into analysis
* buffer and use this analysis information to reduce the amount of work
* the encoder must perform. Default X265_ANALYSIS_OFF */
- int analysisMode;
+ int analysisReuseMode;
- /* Filename for analysisMode save/load. Default name is "x265_analysis.dat" */
- const char* analysisFileName;
+ /* Filename for analysisReuseMode save/load. Default name is "x265_analysis.dat" */
+ const char* analysisReuseFileName;
/*== Rate Control ==*/
@@ -1194,6 +1243,9 @@ typedef struct x265_param
/* sets a hard lower limit on QP */
int qpMin;
+
+ /* internally enable if tune grain is set */
+ int bEnableConstVbv;
} rc;
/*== Video Usability Information ==*/
@@ -1376,9 +1428,9 @@ typedef struct x265_param
int bHDROpt;
/* A value between 1 and 10 (both inclusive) determines the level of
- * information stored/reused in save/load analysis-mode. Higher the refine
- * level higher the informtion stored/reused. Default is 5 */
- int analysisRefineLevel;
+ * information stored/reused in save/load analysis-reuse-mode. Higher the refine
+ * level higher the information stored/reused. Default is 5 */
+ int analysisReuseLevel;
/* Limit Sample Adaptive Offset filter computation by early terminating SAO
* process based on inter prediction mode, CTU spatial-domain correlations,
@@ -1391,7 +1443,44 @@ typedef struct x265_param
/* Insert tone mapping information only for IDR frames and when the
* tone mapping information changes. */
int bDhdr10opt;
+
+ /* Determine how x265 react to the content information recieved through the API */
+ int bCTUInfo;
+
+ /* Use ratecontrol statistics from pic_in, if available*/
+ int bUseRcStats;
+
+ /* Factor by which input video is scaled down for analysis save mode. Default is 0 */
+ int scaleFactor;
+
+ /* Enable intra refinement in load mode*/
+ int intraRefine;
+
+ /* Enable inter refinement in load mode*/
+ int interRefine;
+
+ /* Enable motion vector refinement in load mode*/
+ int mvRefine;
+
+ /* Log of maximum CTU size */
+ uint32_t maxLog2CUSize;
+
+ /* Actual CU depth with respect to config depth */
+ uint32_t maxCUDepth;
+
+ /* CU depth with respect to maximum transform size */
+ uint32_t unitSizeDepth;
+
+ /* Number of 4x4 units in maximum CU size */
+ uint32_t num4x4Partitions;
+
+ /* Specify if analysis mode uses file for data reuse */
+ int bUseAnalysisFile;
+
+ /* File pointer for csv log */
+ FILE* csvfpt;
} x265_param;
+
/* x265_param_alloc:
* Allocates an x265_param instance. The returned param structure is not
* special in any way, but using this method together with x265_param_free()
@@ -1558,7 +1647,8 @@ int x265_encoder_reconfig(x265_encoder *, x265_param *);
void x265_encoder_get_stats(x265_encoder *encoder, x265_stats *, uint32_t statsSizeBytes);
/* x265_encoder_log:
- * This function is deprecated */
+ * write a line to the configured CSV file. If a CSV filename was not
+ * configured, or file open failed, this function will perform no write. */
void x265_encoder_log(x265_encoder *encoder, int argc, char **argv);
/* x265_encoder_close:
@@ -1581,6 +1671,12 @@ void x265_encoder_close(x265_encoder *);
int x265_encoder_intra_refresh(x265_encoder *);
+/* x265_encoder_ctu_info:
+ * Copy CTU information such as ctu address and ctu partition structure of all
+ * CTUs in each frame. The function is invoked only if "--ctu-info" is enabled and
+ * the encoder will wait for this copy to complete if enabled.
+ */
+int x265_encoder_ctu_info(x265_encoder *, int poc, x265_ctu_info_t** ctu);
/* x265_cleanup:
* release library static allocations, reset configured CTU size */
void x265_cleanup(void);
@@ -1629,6 +1725,7 @@ typedef struct x265_api
int sizeof_frame_stats; /* sizeof(x265_frame_stats) */
int (*encoder_intra_refresh)(x265_encoder*);
+ int (*encoder_ctu_info)(x265_encoder*, int, x265_ctu_info_t**);
/* add new pointers to the end, or increment X265_MAJOR_VERSION */
} x265_api;
diff --git a/source/x265cli.h b/source/x265cli.h
index 7b85d95..14fd6ce 100644
--- a/source/x265cli.h
+++ b/source/x265cli.h
@@ -122,6 +122,7 @@ static const struct option long_options[] =
{ "scenecut", required_argument, NULL, 0 },
{ "no-scenecut", no_argument, NULL, 0 },
{ "scenecut-bias", required_argument, NULL, 0 },
+ { "ctu-info", required_argument, NULL, 0 },
{ "intra-refresh", no_argument, NULL, 0 },
{ "rc-lookahead", required_argument, NULL, 0 },
{ "lookahead-slices", required_argument, NULL, 0 },
@@ -158,6 +159,8 @@ static const struct option long_options[] =
{ "qpstep", required_argument, NULL, 0 },
{ "qpmin", required_argument, NULL, 0 },
{ "qpmax", required_argument, NULL, 0 },
+ { "const-vbv", no_argument, NULL, 0 },
+ { "no-const-vbv", no_argument, NULL, 0 },
{ "ratetol", required_argument, NULL, 0 },
{ "cplxblur", required_argument, NULL, 0 },
{ "qblur", required_argument, NULL, 0 },
@@ -247,9 +250,13 @@ static const struct option long_options[] =
{ "no-slow-firstpass", no_argument, NULL, 0 },
{ "multi-pass-opt-rps", no_argument, NULL, 0 },
{ "no-multi-pass-opt-rps", no_argument, NULL, 0 },
- { "analysis-mode", required_argument, NULL, 0 },
- { "analysis-file", required_argument, NULL, 0 },
- { "refine-level", required_argument, NULL, 0 },
+ { "analysis-reuse-mode", required_argument, NULL, 0 },
+ { "analysis-reuse-file", required_argument, NULL, 0 },
+ { "analysis-reuse-level", required_argument, NULL, 0 },
+ { "scale-factor", required_argument, NULL, 0 },
+ { "refine-intra", required_argument, NULL, 0 },
+ { "refine-inter", no_argument, NULL, 0 },
+ { "no-refine-inter",no_argument, NULL, 0 },
{ "strict-cbr", no_argument, NULL, 0 },
{ "temporal-layers", no_argument, NULL, 0 },
{ "no-temporal-layers", no_argument, NULL, 0 },
@@ -271,6 +278,8 @@ static const struct option long_options[] =
{ "dhdr10-info", required_argument, NULL, 0 },
{ "dhdr10-opt", no_argument, NULL, 0},
{ "no-dhdr10-opt", no_argument, NULL, 0},
+ { "refine-mv", no_argument, NULL, 0 },
+ { "no-refine-mv", no_argument, NULL, 0 },
{ 0, 0, 0, 0 },
{ 0, 0, 0, 0 },
{ 0, 0, 0, 0 },
@@ -316,9 +325,9 @@ static void showHelp(x265_param *param)
H1(" 1 - i420 (4:2:0 default)\n");
H1(" 2 - i422 (4:2:2)\n");
H1(" 3 - i444 (4:4:4)\n");
-#if ENABLE_DYNAMIC_HDR10
- H0(" --dhdr10-info <filename> JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping \n");
- H0(" --[no-]dhdr10-opt Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled");
+#if ENABLE_HDR10_PLUS
+ H0(" --dhdr10-info <filename> JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping\n");
+ H0(" --[no-]dhdr10-opt Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled\n");
#endif
H0("-f/--frames <integer> Maximum number of frames to encode. Default all\n");
H0(" --seek <integer> First frame to encode\n");
@@ -367,6 +376,11 @@ static void showHelp(x265_param *param)
H1(" --[no-]tskip-fast Enable fast intra transform skipping. Default %s\n", OPT(param->bEnableTSkipFast));
H1(" --nr-intra <integer> An integer value in range of 0 to 2000, which denotes strength of noise reduction in intra CUs. Default 0\n");
H1(" --nr-inter <integer> An integer value in range of 0 to 2000, which denotes strength of noise reduction in inter CUs. Default 0\n");
+ H0(" --ctu-info <integer> Enable receiving ctu information asynchronously and determine reaction to the CTU information (0, 1, 2, 4, 6) Default 0\n"
+ " - 1: force the partitions if CTU information is present\n"
+ " - 2: functionality of (1) and reduce qp if CTU information has changed\n"
+ " - 4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise\n"
+ " Enable this option only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously\n");
H0("\nCoding tools:\n");
H0("-w/--[no-]weightp Enable weighted prediction in P slices. Default %s\n", OPT(param->bEnableWeightedPred));
H0(" --[no-]weightb Enable weighted prediction in B slices. Default %s\n", OPT(param->bEnableWeightedBiPred));
@@ -431,9 +445,13 @@ static void showHelp(x265_param *param)
H0(" --[no-]analyze-src-pics Motion estimation uses source frame planes. Default disable\n");
H0(" --[no-]slow-firstpass Enable a slow first pass in a multipass rate control mode. Default %s\n", OPT(param->rc.bEnableSlowFirstPass));
H0(" --[no-]strict-cbr Enable stricter conditions and tolerance for bitrate deviations in CBR mode. Default %s\n", OPT(param->rc.bStrictCbr));
- H0(" --analysis-mode <string|int> save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisMode);
- H0(" --analysis-file <filename> Specify file name used for either dumping or reading analysis data.\n");
- H0(" --refine-level <1..10> Level of analysis refinement indicates amount of info stored/reused in save/load mode, 1:least....10:most. Default %d\n", param->analysisRefineLevel);
+ H0(" --analysis-reuse-mode <string|int> save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisReuseMode);
+ H0(" --analysis-reuse-file <filename> Specify file name used for either dumping or reading analysis data. Deault x265_analysis.dat\n");
+ H0(" --analysis-reuse-level <1..10> Level of analysis reuse indicates amount of info stored/reused in save/load mode, 1:least..10:most. Default %d\n", param->analysisReuseLevel);
+ H0(" --scale-factor <int> Specify factor by which input video is scaled down for analysis save mode. Default %d\n", param->scaleFactor);
+ H0(" --refine-intra <int> Enable intra refinement for load mode. Default %d\n", param->intraRefine);
+ H0(" --[no-]refine-inter Enable inter refinement for load mode. Default %s\n", OPT(param->interRefine));
+ H0(" --[no-]refine-mv Enable mv refinement for load mode. Default %s\n", OPT(param->mvRefine));
H0(" --aq-mode <integer> Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance 3:auto variance with bias to dark scenes. Default %d\n", param->rc.aqMode);
H0(" --aq-strength <float> Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
H0(" --[no-]aq-motion Adaptive Quantization based on the relative motion of each CU w.r.t., frame. Default %s\n", OPT(param->bOptCUDeltaQP));
@@ -446,6 +464,7 @@ static void showHelp(x265_param *param)
H1(" --qpstep <integer> The maximum single adjustment in QP allowed to rate control. Default %d\n", param->rc.qpStep);
H1(" --qpmin <integer> sets a hard lower limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMin);
H1(" --qpmax <integer> sets a hard upper limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMax);
+ H0(" --[no-]const-vbv Enable consistent vbv. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableConstVbv));
H1(" --cbqpoffs <integer> Chroma Cb QP Offset [-12..12]. Default %d\n", param->cbQpOffset);
H1(" --crqpoffs <integer> Chroma Cr QP Offset [-12..12]. Default %d\n", param->crQpOffset);
H1(" --scaling-list <string> Specify a file containing HM style quant scaling lists or 'default' or 'off'. Default: off\n");
--
x265 packaging
More information about the pkg-multimedia-commits
mailing list