简体   繁体   English

将YUV420(NV12)上采样到YUV422的正确方法是什么?

[英]What is the right way of upsampling YUV420 (NV12) to YUV422?

I have a YUV420 image (NV12 image, but it shouldn't matter). 我有一个YUV420图像(NV12图像,但这没关系)。 I am trying to upsample it to YUV422. 我正在尝试将其升采样到YUV422。

The problem is I am not able to find the right weightage that needs to be given to the UV samples in YUV420 to figure out the UV in YUV422 images. 问题是我无法在YUV420中找到需要给UV样本正确的权重,以找出YUV422图像中的UV。

x -> Y 
0 -> UV

YUV420       YUV422
x x x x      x x x x
o   o        o   o  
x x x x      x x x x
         to  o   o 
x x x x      x x x x
o   o        o   o
x x x x      x x x x
             o   o 

Right now I am just repeating the UV samples, However that is not the right way. 现在,我只是在重复UV样本,但这不是正确的方法。 So, the question is, Is there a standard way of doing the chroma upsampling? 因此,问题是,是否有进行色度上采样的标准方法? Can someone direct me to the theory of it? 有人可以指导我学习它的理论吗?

NOTE: I want to implement this and am not interested in tools that will do it. 注意:我要实现此功能,并且对实现此功能的工具不感兴趣。 Interested if you can direct me to the source code of these tools that do it according to some standard (assuming there is one :D) 如果您可以将我定向到根据某种标准进行操作的这些工具的源代码(假设有一个:D),则有兴趣。

Thanks 谢谢

You're basically asking that if I have an array of N (where N=height/2) [vertical] samples (which happen to be U - or perhaps V), how can I convert that to an array of N*2 samples with correct interpolation? 您基本上是在问,如果我有N个数组(其中N = height / 2)[垂直]个样本(恰好是U-也许是V),如何将其转换为N * 2个样本数组正确的插值? The answer is indeed interpolation. 答案的确是内插。 I'm going to ignore the horizontal aspect because of the scope of your question, but it should be easy to understand that also. 由于您所讨论的问题的范围,我将忽略水平方面,但也应该很容易理解。

First of all: chroma positioning . 首先: 色度定位 Let's assume I had an array of N*2 Y [vertical] samples, and the array of size of U (or V) is only N. It's clear that chroma subsampling implies that for every 2 Y samples, there's only one U (or V) sample [vertically]. 假设我有N * 2个Y [垂直]样本数组,而U(或V)大小的数组仅为N。很明显,色度二次采样意味着每2个Y样本中只有一个U(或V)[垂直]取样。 But it doesn't tell you where the U/V samples are located. 但这并不能告诉您U / V样本的位置。 In yuv422 [vertical], this is obvious, the vertical position of each U (or V) aligns perfectly with the Y sample's vertical position. 在yuv422 [垂直]中,这很明显,每个U(或V)的垂直位置与Y样本的垂直位置完全对齐。 But for subsampled yuv420? 但是对于yuv420采样? Is the center of the vertical position of the first U value aligned with the vertical position of the first Y value ["top"]? 第一个U值的垂直位置的中心是否与第一个Y值的垂直位置[“ top”]对齐? Or exactly in between the first and second Y sample ["middle"]? 还是正好在第一个和第二个Y样本[“ middle”]之间? Or (this would be strange, but theoretically it could happen) the center of the second Y sample ["bottom"]? 还是(这很奇怪,但理论上可能会发生)第二个Y样本的中心[“底部”]?

Y1 U <- top    Y1                Y1
.              .  U <- center    .
Y2             Y2                Y2 U <- bottom

For context, this is the "chroma_sample_location_type" element in the VUI of the SPS in the H.264 header. 对于上下文,这是H.264标头中SPS的VUI中的“ chroma_sample_location_type”元素。

Next, what do we do with this information? 接下来,我们如何处理此信息? Well, interpolating from yuv420 to yuv422 basically means [vertically] increasing the resolution times two. 好吧,从yuv420到yuv422进行插值基本上意味着[垂直]将分辨率乘以2。 Imagine now that you have a grayscale image and you want to increase the resolution. 现在想象一下,您有一个灰度图像,并且想要提高分辨率。 You use a scaling algorithm, and scaling means interpolation. 您使用缩放算法,缩放表示插值。 The fact that the target and source height are exact multiples of each other is a special case, but the fact that you have to use a scaling algorithm (ie a scaling filter) doesn't change. 目标高度和源高度是彼此精确倍数的事实是一种特殊情况,但是您必须使用缩放算法(即缩放滤波器)的事实并没有改变。 So, what filter do you use? 那么,您使用什么过滤器?

Nearest neighbour is easiest, it means you pick the value from the closest source position: 最近的邻居最容易,这意味着您从最近的源位置中选择值:

Y1 U1in <- top               Y1 U1out=U1in
.                            .
Y2                           Y2 U2out=U1in?
.                 becomes    .
Y3 U2in                      Y3 U3out=U2in
.                            .
Y4                           Y4 U4out=U2in?

Mathematically, U2out could also be U2in, since the distance is equal. 从数学上讲,由于距离相等,所以U2out也可以是U2in。 Here, it also becomes obvious why chroma positioning is important, compare it with center: 在这里,将色度定位很重要的原因也很明显,将其与中心进行比较:

Y1                              Y1 U1out=U1in
.  U1in <- center               .
Y2                              Y2 U2out=U1in
.                    becomes    .
Y3                              Y3 U3out=U2in
.  U2in                         .
Y4                              Y4 U4out=U2in

Note how the question marks disappeared. 请注意问号如何消失。 Now, there's not actually any filtering going on yet, so let's get into that. 现在,实际上还没有进行任何过滤,因此让我们开始吧。

The easiest filter is bilinear (or in 1D: linear). 最简单的滤波器是双线性的 (或在一维中是线性的)。 Here, you use two U samples and interpolate them into one, where the weight of each source pixel is decided by their relative distance to the destination pixel. 在这里,您使用两个U样本并将其内插到一个样本中,其中每个源像素的权重取决于它们与目标像素的相对距离。

Y1 U1in <- top               Y1 U1out=U1in
.                            .
Y2                           Y2 U2out=(U1in+U2in)/2
.                 becomes    .
Y3 U2in                      Y3 U3out=U2in
.                            .
Y4                           Y4 U4out=(U2in+U3in)/2

or: 要么:

Y1                              Y1 U1out=U1in
.  U1in <- center               .
Y2                              Y2 U2out=(U1in*3+U2in)/4
.                    becomes    .
Y3                              Y3 U3out=(U1in+U2in*3)/4
.  U2in                         .
Y4                              Y4 U4out=(U2in*3+U3in)/4

As you search more filtering algorithms on eg wikipedia, you'll notice that this is a whole research area and there's more complicated algorithms available, such as bicubic (or in 1D: cubic) or lanczos. 当您在Wikipedia等上搜索更多过滤算法时,您会注意到这是一个完整的研究领域,并且有更复杂的算法可用,例如双三次 (或一维:立方)或lanczos。 For these, IMO it goes too far to explain them here, just look up the functions on wikipedia and do as you need. 对于这些,IMO在这里无法对它们进行解释,只需要在Wikipedia上查找功能并根据需要进行即可。 Which one is right for you is a matter of taste - or better, it basically depends on how you want to balance quality and speed. 哪一个适合您是一个品味问题-或更好,它基本上取决于您如何平衡质量和速度。 Higher-tap filters (lanczos > cubic > linear > nearest-neighbour) will give better quality, but will also be computationally slower. 抽头较高的过滤器(lanczos>三次方>线性>最近邻)将提供更好的质量,但计算速度也较慢。

Lastly, you mentioned that you're interested in doing this yourself, which is why I explain all this here. 最后,您提到您对自己执行此操作感兴趣,这就是为什么我在这里解释所有这些。 But please understand that writing a bug-free, high-quality multi-tap filtering function (eg for lanczos, or even bicubic) will actually take quite some time/effort and will require significant knowledge in vector processing (SIMD, eg x86 AVX/SSE or arm Neon) to be practically useful. 但是请理解,编写无错误的高质量多抽头过滤功能(例如用于lanczos甚至双三次)实际上会花费一些时间/精力,并且需要在向量处理(SIMD,例如x86 AVX / SSE或Arm Neon)实用。 If your end goal is to use this in any serious setting, you probably do want to use existing software that implements these algorithms, eg swscale in ffmpeg, simply because they already implement all of this. 如果您的最终目标是在任何严重的情况下使用它,则可能确实想使用实现这些算法的现有软件,例如ffmpeg中的swscale ,仅因为它们已经实现了所有这些。

Just double every byte of U and V planes. 只需将UV平面的每个字节加倍。 Or you may take average between consecutive bytes. 或者您可以取连续字节之间的平均值。

You may as well try to step through with debugger in libswscale (from ffmpeg) to see what they do. 您也可以尝试通过libswscale(来自ffmpeg)中的调试器来逐步了解它们的作用。 If it's difficult for you or you don't know how to do that, you can take some picture and convert it to YUV 420 and then convert that YUV420 to YUV422 and then print a few bytes from source U frame and bytes from result U frame and see what kind of math was done. 如果您觉得困难或不知道该怎么做,则可以拍摄一些图片并将其转换为YUV 420,然后将该YUV420转换为YUV422,然后从源U帧打印一些字节,从结果U帧打印一些字节并查看完成了哪种数学运算。 Most likely simply doubling you'll get visually acceptable results. 最有可能的是,只要加倍,您将获得视觉上可接受的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM