需要帮助矢量化此代码

Question

I have an 8-bit image. 我有一个8位图像。 For each pixel, I need to work out its ordinal position in the current row. 对于每个像素，我需要计算当前行中的序号位置。 For example, if the row is: 例如，如果行是：

32 128 16 64,

then I need the result: 那我需要结果：

1 3 0 2,

since 32 is the 1st highest value in the row, 128 is 3rd highest, 16 is 0th highest and 64 is 2nd highest. 因为32是该行中的第1个最高值，所以128是第3个最高，16个是第0个最高值，64个是第2个最高值。

I need to repeat the above procedure for all rows of the image. 我需要对图像的所有行重复上述过程。 Here is the non-vectorized code: 这是非矢量化代码：

for (int curr = 0; curr < new_height; ++curr)
{
    vector<pair<unsigned char, char> > ordered;
    for (char i = 0; i < 4; ++i)
    {
        unsigned char val = luma24.at<unsigned char>(curr, i);
        ordered.push_back(pair<unsigned char, char>(val, i));
    }
    sort(ordered.begin(), ordered.end(), cmpfun);
    for (int i = 0; i < 4; ++i)
        signature.at<char>(curr, ordered[i].second) = i;
}

luma24 is the 8-bit image I'm reading from, and it has new_height rows and 4 columns. luma24是我正在读取的8位图像，它有new_height行和4列。 signature is a signed image of the same size (ignore the difference in sign for now, since its not relevant) -- it's where I'm storing the result. signature是一个相同大小的签名图像（暂时忽略符号的差异，因为它不相关） - 它是我存储结果的地方。 cmpfun is a trivial comparator function. cmpfun是一个简单的比较器函数。

I tried to vectorize the above code and got this: 我试图对上面的代码进行矢量化并得到了这个：

Mat ordinal;
luma24.convertTo(ordinal, CV_16UC1, 256, 0);
Mat sorted = ordinal.clone();
for (int i = 0; i < 4; ++i)
    ordinal(Range::all(), Range(i, i+1)) += i;
cv::sort(ordinal, sorted, CV_SORT_EVERY_ROW | CV_SORT_ASCENDING);
bitwise_and(sorted, Scalar(0x00ff), ordinal);
Mat ordinal8;
ordinal.convertTo(ordinal8, CV_8SC1, 1, 0);
ordinal8.copyTo(signature(Range::all(), Range(0, 4)));

I had to pack the 8-bit value and the 8-bit ordinal into a single 16-bit channel since OpenCV doesn't perform sort for multi-channel images. 由于OpenCV不对多通道图像执行排序，因此我必须将8位值和8位序数打包到单个16位通道中。 This is almost what I need, but not quite. 这几乎是我需要的，但并不完全。 For the example input, it gives me: 对于示例输入，它给了我：

2 0 3 1

since the lowest value is in the 2nd column, next-lowest is in the 0th column, etc. How do I go about converting this to the result I need without accessing each pixel individually? 由于最低值在第二列，下一个最低值在第0列，等等。如何在不单独访问每个像素的情况下将其转换为我需要的结果？

Essentially, I need to somehow vectorize this: 基本上，我需要以某种方式矢量化这个：

uint8_t x[] = {2, 0, 3, 1};
uint8_t y[4];
for (uint8_t i = 0; i < 4; ++i)
    y[x[i]] = i;

where x is the intermediate result my current vectorized code gives me and y is the result I want. 其中x是中间结果，我当前的矢量化代码给出了我， y是我想要的结果。

Can it be done? 可以吗？

Answer 1

I believe this will do the trick for you. 我相信这会为你做到这一点。 It doesn't require allocations or stacks or sorts, but does assume your range is 0-255 (eg uint8). 它不需要分配或堆栈或排序，但假设您的范围是0-255（例如uint8）。 The bigger assumption: It will only be performant if you have wide rows. 更大的假设：如果你有宽行，它只会是高效的。 If they're really 4 pixels wide, that i<256 is kinda ugly. 如果他们真的是4像素宽，那么我<256有点难看。 There are ways to make that go away, but I'm assuming the 4 pixels is just an "eg" for simplicity. 有一些方法可以让它消失，但我认为4像素只是一个简单的“eg”。

void processRow (int* rowpos, uint8_t* pixelsForRow, int w) {
   uint32_t i, pv, v=0, hist[256]={0};
   for (i=0; i<w; i++)      hist[pixelsForRow[i]]++;
   for (i=0; i<256; i++)    {pv=hist[i]; hist[i]=v; v+=pv;}
   for (i=0; i<w; i++)      rowpos[i] = hist[pixelsForRow[i]]++;
}

OK - so how does it work? 好的 - 那它是如何工作的？
line 1 in this function declares and empties a histogram table. 此函数中的第1行声明并清空直方图表。
line 2 computes a histogram. 第2行计算直方图。
line 3 turns it into a counted sort - and is why hist uses larger element size than uint8 第3行将其转换为计数排序 - 这就是为什么hist使用比uint8更大的元素大小
line 4 applies the sorted position. 第4行应用排序的位置。

There are 2 tricks; 有两个技巧; First, in line 3, the histograms are "shifted by 1 index" such the first value is always '0' not whatever it would, and the second value is what the first count would have been, and so on. 首先，在第3行中，直方图“移动1个索引”，这样第一个值总是“0”而不是它会是什么，第二个值是第一个计数将是什么，依此类推。 The second trick is the "++" in line 4 -- always ensures the ordinal value are unique. 第二个技巧是第4行中的“++” - 始终确保序数值是唯一的。

Lets try it on your input: 让我们尝试一下你的输入：
[32 128 16 64] [32 128 16 64]
line 2: [0...1....1....1...1...0] at indices [0, 16, 32, 64, 128, 255] respectively 第2行：[0 ... 1 .... 1 .... 1 ... 1 ... 0]分别在索引[0,16,32,64,128,255]
line 3: [0...0....1....2...3...0] at indices [0, 16, 32, 64, 128, 255] respectively 第3行：[0 ... 0 .... 1 .... 2 ... 3 ... 0]分别在索引[0,16,32,64,128,255]
line 4: [1, 3, 0, 2] ... looks right 第4行：[1,3,0,2] ......看起来正确

Lets try it on slightly different input: 让我们尝试稍微不同的输入：
[32 128 16 32] [32 128 16 32]
line 2: [0...1....2....0...1...0] at indices [0, 16, 32, 64, 128, 255] respectively 第2行：[0 ... 1 .... 2 .... 0 ... 1 ... 0]分别在索引[0,16,32,64,128,255]
line 3: [0...0....1....3...3...0] at indices [0, 16, 32, 64, 128, 255] respectively 第3行：分别为[0,16,32,64,128,255]的[0 ... 0 .... 1 ... 3 ... 3 ... 0]
line 4: [1, 3, 0, 2] ... perfect 第4行：[1,3,0,2] ......完美

but I'm not quite sure if it meets your need for vectorization -- :) 但我不太确定它是否符合你对矢量化的需求 - :)

Answer 2

Another way I can think of is , For each row, create a binary search tree. 我能想到的另一种方法是，为每一行创建一个二叉搜索树。 While doing inorder traversal we can get the rank of each pixel. 在进行顺序遍历时，我们可以获得每个像素的等级。

Each element of the node is a structure 节点的每个元素都是一个结构

// Members of struct explained here.
// row_pos: stores position of that pixel in that row.
//     we populate this while creating binary search tree. 
//
// rank: stores its rank in that row. ()
//  while doing in-order traversal, we come to know rank of that pixel. At that point only, we update that pixel location with its rank.

typedef struct node
{
    int row_pos, rank; 
    node *left, *right;    // left and right nodes.
};

sequence of steps for every row would be: 每一行的步骤顺序如下：

a) O(w): create a binary search tree by storing every pixel's position also in the node. a）O（w）：通过在节点中存储每个像素的位置来创建二叉搜索树。

b) O(w): Start in-order traversal. b）O（w）：开始按顺序遍历。 For every node, fill the pixel location of that node with rank (start counting with first node as 0). 对于每个节点，用rank填充该节点的像素位置（从第一个节点开始计数为0）。

需要帮助矢量化此代码

问题描述

2 个解决方案

解决方案1
0 2013-03-17 05:51:30

解决方案2
0 2013-04-02 01:05:20

需要帮助矢量化此代码

问题描述

2 个解决方案

解决方案1 0 2013-03-17 05:51:30

解决方案2 0 2013-04-02 01:05:20

解决方案1
0 2013-03-17 05:51:30

解决方案2
0 2013-04-02 01:05:20