从RGB到YUV（YCoCg）的颜色转换

Question

I'm trying to implement a color conversion Func that outputs to 3 separate buffers. 我正在尝试实现将颜色转换Func输出到3个单独的缓冲区的功能。 The rgb_to_ycocg function has a 4x8bit channel interleaved buffer (BGRA) and 3 output buffers (Y, Co and Cg) which are each 16bit values. rgb_to_ycocg函数具有一个4x8位通道交错缓冲区（BGRA）和3个输出缓冲区（Y，Co和Cg），每个缓冲区均为16位值。 Currently, I'm using this piece of code: 当前，我正在使用这段代码：

void rgb_to_ycocg(const uint8_t *pSrc, int32_t srcStep, int16_t *pDst[3], int32_t dstStep[3], int width, int height)
{
    Buffer<uint8_t> inRgb((uint8_t *)pSrc, 4, width, height);
    Buffer<int16_t> outY(pDst[0], width, height);
    Buffer<int16_t> outCo(pDst[1], width, height);
    Buffer<int16_t> outCg(pDst[2], width, height);

    Var x, y, c;
    Func calcY, calcCo, calcCg, inRgb16;

    inRgb16(c, x, y) = cast<int16_t>(inRgb(c, x, y));

    calcY(x, y) = (inRgb16(0, x, y) + ((inRgb16(2, x, y) - inRgb16(0, x, y)) >> 1)) + ((inRgb16(1, x, y) - (inRgb16(0, x, y) + ((inRgb16(2, x, y) - inRgb16(0, x, y)) >> 1))) >> 1);
    calcCo(x, y) = inRgb16(2, x, y) - inRgb16(0, x, y);
    calcCg(x, y) =  inRgb16(1, x, y) - (inRgb16(0, x, y) + ((inRgb16(2, x, y) - inRgb16(0, x, y)) >> 1));

    Pipeline p =Pipeline({calcY, calcCo, calcCg});
    p.vectorize(x, 16).parallel(y);
    p.realize({ outY, outCo, outCg });
}

The issue is, I'm getting poor performance compared to the reference implementation (basic for loops in c). 问题是，与参考实现相比（与c中的循环基本相同），我的性能越来越差。 I understand I need to try better scheduling, but I think I'm doing something wrong in terms of input/output buffers. 我知道我需要尝试更好的调度，但是我认为我在输入/输出缓冲区方面做错了。 I've seen the tutorials and tried to come up with a way to output to multiple buffers. 我看过这些教程，并试图提出一种输出到多个缓冲区的方法。 Using a Pipeline was the only way I could find. 我只能找到使用Pipeline的方法。 Would I be better off making 3 Func s and calling them separately? 制作3 Func并分别调用它们会更好吗？ Is this a correct use of the Pipeline class? 这是对Pipeline类的正确使用吗？

Answer 1

The big possible problem here is that you're making and compiling a code every time you want to convert a single image. 这里最大的问题是，每次要转换单个图像时，您都要编写和编译代码。 That would be really really slow. 那真的会很慢。 Use ImageParams instead of Buffers, define the Pipeline once, and then realize it multiple times. 使用ImageParams代替Buffers，一次定义管道，然后多次实现。

A second-order effect is that I think you actually want a Tuple rather than a Pipeline. 二阶效应是，我认为您实际上想要的是元组而不是管道。 A Tuple Func computes all its values in the same inner loop, which will reuse the loads from inRgb, etc. Ignoring the recompilation problem for the moment, try: Tuple Func在同一内部循环中计算所有值，这将重用inRgb等中的负载。暂时忽略重新编译问题，请尝试：

void rgb_to_ycocg(const uint8_t *pSrc, int32_t srcStep, int16_t *pDst[3], int32_t dstStep[3], int width, int height)
{
    Buffer<uint8_t> inRgb((uint8_t *)pSrc, 4, width, height);
    Buffer<int16_t> outY(pDst[0], width, height);
    Buffer<int16_t> outCo(pDst[1], width, height);
    Buffer<int16_t> outCg(pDst[2], width, height);

    Var x, y, c;
    Func calcY, calcCo, calcCg, inRgb16;

    inRgb16(c, x, y) = cast<int16_t>(inRgb(c, x, y));

    out(x, y) = {
        inRgb16(0, x, y) + ((inRgb16(2, x, y) - inRgb16(0, x, y)) >> 1)) + ((inRgb16(1, x, y) - (inRgb16(0, x, y) + ((inRgb16(2, x, y) - inRgb16(0, x, y)) >> 1))) >> 1),
        inRgb16(2, x, y) - inRgb16(0, x, y),
        inRgb16(1, x, y) - (inRgb16(0, x, y) + ((inRgb16(2, x, y) - inRgb16(0, x, y)) >> 1))
    };

    out.vectorize(x, 16).parallel(y);
    out.realize({ outY, outCo, outCg });
}

从RGB到YUV（YCoCg）的颜色转换

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-03-16 19:44:25

从RGB到YUV（YCoCg）的颜色转换

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-03-16 19:44:25

解决方案1
2 已采纳 2017-03-16 19:44:25