简体   繁体   中英

Why does ffmpeg output slightly different RGB values when converting to gbrp and rgb24?

By using one the following command-lines, it is possible to convert a video stream to an RGB buffer:

ffmpeg -i video.mp4 -frames 1 -color_range pc -f rawvideo -pix_fmt rgb24 output.rgb24
ffmpeg -i video.mp4 -frames 1 -color_range pc -f rawvideo -pix_fmt gbrp output.gbrp

These RGB buffers can then be read, for example using Python and NumPy:

import numpy as np


def load_buffer_gbrp(path, width=1920, height=1080):
    """Load a gbrp 8-bit raw buffer from a file"""
    data = np.frombuffer(open(path, "rb").read(), dtype=np.uint8)
    data_gbrp = data.reshape((3, height, width))
    img_rgb = np.empty((height, width, 3), dtype=np.uint8)
    img_rgb[..., 0] = data_gbrp[2, ...]
    img_rgb[..., 1] = data_gbrp[0, ...]
    img_rgb[..., 2] = data_gbrp[1, ...]
    return img_rgb


def load_buffer_rgb24(path, width=1920, height=1080):
    """Load an rgb24 8-bit raw buffer from a file"""
    data = np.frombuffer(open(path, "rb").read(), dtype=np.uint8)
    img_rgb = data.reshape((height, width, 3))
    return img_rgb


buffer_rgb24 = load_buffer_rgb24("output.rgb24")
buffer_gbrp = load_buffer_gbrp("output.gbrp")

Theoretically, the two outputs should have the same RGB values (only the layout in memory should differ); in the real world, this is not the case:

import matplotlib.pyplot as plt

diff = buffer_rgb24.astype(float) - buffer_gbrp.astype(float)
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, constrained_layout=True, figsize=(12, 2.5))
ax1.imshow(buffer_rgb24)
ax1.set_title("rgb24")
ax2.imshow(buffer_gbrp)
ax2.set_title("gbrp")
im = ax3.imshow(diff[..., 1], vmin=-5, vmax=+5, cmap="seismic")
ax3.set_title("difference (green channel)")
plt.colorbar(im, ax=ax3)
plt.show()

rgb24和gbrp的区别

The converted frame differs by more than what could be explained by chroma-subsampling or rounding errors (difference is around 2-3, rounding errors would be less than 1), and, what's worse, seems to have a uniform bias on the whole image.

Why is that so, and what ffmpeg parameters affect this behavior?

Good analysis so far. Let me try to add some perspective from the swscale side, hope that helps further in explaining the differences you're seeing and what they technically originate from.

The differences you see are indeed caused by different rounding. These differences are not because rgb24/gbrp are fundamentally different (they are different layouts of the same fundamental data type), but because the implementations were written for different use cases at different times by different people.

yuv420p-to-rgb24 (and the other way around) are very, very old implementations that come from before swscale was part of FFmpeg. These implementations have MMX (.) optimizations and are optimized for optimal conversion on Pentium machines (.). This is mid-90s technology or so. The idea here was to convert JPEG and MPEG-1 to/from monitor-compatible output before YUV output was a thing. The MMX optimizations are actually pretty well-tuned for their time.

You can imagine that when speed is as important as it was here (and at that time, YUV-to-rgb24 conversion was slow). YUV-to-RGB is a simple matrix multiplication (with coefficients depending on what the exact YUV colorspace is). However, the resolution of the UV planes are different from the Y & RGB planes. In the simple (non-exact) yuv-to-rgb24 conversion, the UV are upsampled using next-neighbour conversion, so each RGB[x,y] uses Y[x,y] and UV[x/2,y/2] as input, or in other words, UV input samples are re-used 2x2 times for each output RGB pixel. The flag full_chroma_int "undoes" this optimization/shortcut. This means the chroma plane is upsampled using actual scaling conversions before the YUV-to-RGB conversion is initiated, and this upsampling can use filters such as bilinear , bicubic or even more advanced/expensive kernels (eg lanczos , sinc or spline ).

bitexact is a generic term in FFmpeg to disable SIMD optimizations that don't generate the exact same output as the C function. I'll ignore that for now beyond just stating what it means.

Lastly, accurate_rnd : if I remember correctly, the idea here is that in matrix multiplications (independent of whether you use chroma plane upsampling or not), the typical way to do the integer-equivalent of the floating-point r = v*coef1 + y in a given precision (eg using 15 bits coefficients) is r = y + ((v*coef1 + 0x4000) >> 15) . However, in x86 SIMD, this requires you to use the instruction pmulhrsw which is only available in SSSE3, not in MMX. Also, it means for the g = u*coef2 + v*coef3 + y you need pmaddwd and round/shift using separate instructions. So, instead, the MMX SIMD instead uses pmulhw (an unrounded version of pmulhrsw ), which basically makes it r = y + (v*coef1>>16) (using 16-bits coefficients). This is mathematically very close, but not as precise, especially not for the G pixel (since it turns g = (u*coef2 + v * coef3 + 0x8000) >> 16) + y into g = (u*coef2>>16) + (v*coef3>>16) + y ). accurate_rnd "undoes" this optimization/shortcut.

Now, YUV-to-gbrp. GBR-planar was added for H264 RGB support, since H264 codes RGB as "just another" YUV variant, but G is in the Y plane etc. You can imagine that speed was much less of an issue, as was MMX support. So here, the math was done correctly. In fact, if I remember correctly, accurate_rnd was only added afterwards so YUV-to-rgb24 could output identical pixels as YUV-to-gbrp and make the two outputs equivalent, but at the cost of not being able to use the (old) MMX optimizations that were inherited when swscale was merged into FFmpeg. This upsamples correctly with a user-configured scaling kernel by default because the planar conversion will only be done when all YUV planes have the same size, that is, it's strictly only does the matrix multiplication. This was added in something like 2015 or so, so we're talking about an eternity in computer programming terms.

Nowadays, the performance gain from "imprecise" implementations such as YUv-to-rgb24 are not considered worth it vs. the actual quality lost in the imprecise rounding and lack of configurable scaling for the chroma planes. This is why most people will recommend you to use -sws_flags accurate_rnd+full_chroma_int . Also, nowadays there are x86 SIMD (SSSE3 and AVX2) implementations for the "slower" conversion path, whereas around 2010, that was all straight C code with nobody wanting to invest time to optimize it. I'm guessing that -sws_flags accurate_rnd+full_chroma_int will perform slightly worse than "fast" YUV-to-rgb24 conversion, because it does chroma upsampling and matrix multiplication in two steps instead of one. But on modern x86 hardware, the performance penalty of this should be minimal and acceptable unless you're actually severely resource-constrained.

Hope that all makes sense.

The following led me to a wild chase through various ffmpeg options, but all this is as far as I could tell never really documented, so I hope it's going to be useful to others who are as puzzled as I am by these fairly cryptic behaviors.


The difference is caused by the default parameters for libswscale , the ffmpeg component responsible for converting from YUV to RGB; in particular, adding the full_chroma_int+bitexact+accurate_rnd flags eliminates the difference between the frames:

ffmpeg -i video.mp4 -frames 1 -color_range pc -f rawvideo -pix_fmt rgb24 -sws_flags full_chroma_int+bitexact+accurate_rnd output_good.rgb24
ffmpeg -i video.mp4 -frames 1 -color_range pc -f rawvideo -pix_fmt gbrp -sws_flags full_chroma_int+bitexact+accurate_rnd output_good.gbrp

Note that various video forums tout these flags (or a subset thereof) as "better" without really providing explanations, which doesn't really satisfy me. They are indeed better for the issue here, let's see how.

First, the new outputs all align with the gbrp output for the default options, which is good news!

buffer_rgb24_good = load_buffer_rgb24("output_good.rgb24")
buffer_gbrp_good = load_buffer_gbrp("output_good.gbrp")

diff1 = buffer_rgb24_good.astype(float) - buffer_gbrp.astype(float)
diff2 = buffer_gbrp_good.astype(float) - buffer_gbrp.astype(float)
fig, (ax1, ax2) = plt.subplots(ncols=2, constrained_layout=True, figsize=(8, 2.5))
ax1.imshow(diff1[..., 1], vmin=-5, vmax=+5, cmap="seismic")
ax1.set_title("rgb24 (new) - gbrp (default)")
im = ax2.imshow(diff2[..., 1], vmin=-5, vmax=+5, cmap="seismic")
ax2.set_title("gbrp (new) - gbrp (default)")
plt.colorbar(im, ax=ax2)
plt.show()

新标志和默认标志之间的差异图


The ffmpeg source code uses the following functions internally to do the conversions in libswscale/output.c :

  • yuv2rgb_full_1_c_template (and other variants) for rgb24 with full_chroma_int
  • yuv2rgb_1_c_template (and other variants) for rgb24 without full_chroma_int
  • yuv2gbrp_full_X_c (and other variants) for gbrp , independently of full_chroma_int

An important conclusion is that the full_chroma_int parameter seems to be ignored for gbrp format but not for rgb24 and is the main cause of the uniform bias.

Note that in non- rawvideo outputs, ffmpeg can select a supported pixel format depending on the selected format, and as such might get by default in either case without the user being aware of it.


An additional question is: are these the correct values? In other words, is it possible that both may be biased the same way? Taking the colour-science Python package, we can convert YUV data to RGB using a different implementation than ffmpeg to gain more confidence.

Ffmpeg can output raw YUV frames in the native format, which can be decoded provided you know how they're laid out.

$ ffmpeg -i video.mp4 -frames 1 -f rawvideo -pix_fmt yuv444p output.yuv
...
Output #0, rawvideo, to 'output.yuv':
...
 Stream #0:0(und): Video: rawvideo... yuv444p

We can read that with Python:

def load_buffer_yuv444p(path, width=1920, height=1080):
    """Load an yuv444 8-bit raw buffer from a file"""
    data = np.frombuffer(open(path, "rb").read(), dtype=np.uint8)
    img_yuv444 = np.moveaxis(data.reshape((3, height, width)), 0, 2)
    return img_yuv444

buffer_yuv = load_buffer_yuv444p("output.yuv")

Then this can be converted to RGB:

import colour

rgb_ref = colour.YCbCr_to_RGB(buffer_yuv, colour.WEIGHTS_YCBCR["ITU-R BT.709"], in_bits=8, in_legal=True, in_int=True, out_bits=8, out_legal=False, out_int=True)

...and used as a reference:

diff1 = buffer_rgb24_good.astype(float) - rgb_ref.astype(float)
diff2 = buffer_gbrp_good.astype(float) - rgb_ref.astype(float)
diff3 = buffer_rgb24.astype(float) - rgb_ref.astype(float)
diff4 = buffer_gbrp.astype(float) - rgb_ref.astype(float)
fig, axes = plt.subplots(ncols=2, nrows=2, constrained_layout=True, figsize=(8, 5))
im = axes[0, 0].imshow(diff1[..., 1], vmin=-5, vmax=+5, cmap="seismic")
axes[0, 0].set_title("rgb24 (new) - reference")
im = axes[0, 1].imshow(diff2[..., 1], vmin=-5, vmax=+5, cmap="seismic")
axes[0, 1].set_title("gbrp (new) - reference")
im = axes[1, 0].imshow(diff3[..., 1], vmin=-5, vmax=+5, cmap="seismic")
axes[1, 0].set_title("rgb24 (default) - reference")
im = axes[1, 1].imshow(diff4[..., 1], vmin=-5, vmax=+5, cmap="seismic")
axes[1, 1].set_title("gbrp (default) - reference")
plt.show()

rgb24/gbrp 和旧/新标志之间的比较参考

There are remaining differences due to slightly different interpolation methods and rounding errors but no uniform bias, so the two implementations mostly agree up to that.

(Note: In this example the output.yuv file is in yuv444p , converted automatically from the native format of yuv420p by ffmpeg in the above command-line without going to the full RGB to YUV conversion. A more complete test would do all the previous conversions from a single raw YUV frame instead of a regular video to better isolate the differences.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM