在Renderscript中使用不确定大小的输出分配

Question

I'm trying to wrap my head around the most efficient way to deal with arrays of indeterminate size as outputs of RS kernels. 我试图围绕最有效的方式处理不确定大小的数组作为RS内核的输出。 I would send the index of the last relevant array slot in the out allocation, but I learned in the answer to my previous question , there's not a good way to pass a global back to java after kernel execution. 我会在out分配中发送最后一个相关数组槽的索引，但是我在上一个问题的答案中学到了，在内核执行后，没有一种好的方法可以将全局传回java。 I've decided to "zoom out" the process again which lead me to the pattern below. 我决定再次“缩小”这个过程，这导致我进入下面的模式。

For example let's say we have an input allocation containing a struct (or structs) that that contains two arrays of polar coordinates; 例如，假设我们有一个包含结构（或结构）的输入分配，它包含两个极坐标数组; something like set_pair from bellow: 来自bellow的set_pair：

typedef struct polar_tag{
  uint8_t angle;
  uint32_t mag;
} polar;

typedef struct polar_set_tag{
  uint8_t filled_slots;
  polar coordinates[60];
} polar_set;

typedef struct set_pair_tag{
  polar_set probe_set;
  polar_set candidate_set;
} set_pair;

We want to find similar coordinate pairs between the sets so we set up a kernel to decide which (if any) of the polar coordinates are similar. 我们想在集合之间找到类似的坐标对，因此我们设置一个内核来决定哪个（如果有的话）极坐标是相似的。 If they're similar we load it into an output allocation that looks something like "matching_set": 如果它们相似，我们将它加载到类似“matching_set”的输出分配中：

typedef struct matching_pair_tag{
  uint8_t probe_index;
  uint8_t candidate_index;
} matching_pair;

typedef struct matching_set_tag{
  matching_pair pairs[120];
  uint8_t filled_slots;
} matching_set;

Is creating allocations with instructions like "filled_slots" the most efficient (or only) way to handle this sort of indeterminate I/O with RS or is there a better way? 使用像“filled_slots”这样的指令创建分配是使用RS处理这种不确定I / O的最有效（或唯一）方法，还是有更好的方法？

Answer 1

I think the way I would try to approach this is to do a two pass. 我认为我试图接近这个的方法是做两次通过。

For the 0-2 case: 对于0-2案例：

Setup: for each coordinate, allocate an array to hold the max expected number of pairs (2). 设置：对于每个坐标，分配一个数组以保持最大预期对数（2）。

Pass 1: run over coords, look for pairs by comparing the current item to a subset of other coords. 通过1：在coords上运行，通过将当前项目与其他coords的子集进行比较来查找对。 Choose subset to avoid duplicate answers when the kernel runs on the other coord being compared. 当内核在比较的其他coord上运行时，选择子集以避免重复的答案。

Pass 2: Merge the results from #1 back into a list or whatever other data structure you want. 传递2：将＃1的结果合并回列表或您想要的任何其他数据结构。 Could run as an invokable if the number of coordinates is small. 如果坐标数很小，可以作为invokable运行。

For the 0-N case: 对于0-N案例：

This gets a lot harder. 这变得更加艰难。 I'd likely do something similar to what's above but with the per-coord array sized for a typical number of pairs. 我可能会做类似于上面的内容，但是对于典型数量的对，每个coord数组的大小。 For the (hopefully small) number of overflows, use atomics to reserve a slot in an overflow buffer. 对于（希望很小的）溢出次数，使用atomics在溢出缓冲区中保留一个槽。 The catch here is I think most GPU drivers would not be very happy with the atomics today. 这里的问题是我认为大多数GPU驱动程序对今天的原子不太满意。 Would run very well on the CPU ref. 在CPU ref上会运行得很好。

There are a lot of ways to go about this. 有很多方法可以解决这个问题。 One important decision point revolves around how expensive the comparison is to find the points vs the cost of writing the result. 一个重要的决策点围绕着比较是多么昂贵，以找到积分与写作结果的成本。

在Renderscript中使用不确定大小的输出分配

问题描述

1 个解决方案

解决方案1
2 已采纳 2013-10-23 21:04:06

在Renderscript中使用不确定大小的输出分配

问题描述

1 个解决方案

解决方案1 2 已采纳 2013-10-23 21:04:06

解决方案1
2 已采纳 2013-10-23 21:04:06