加权元素的笛卡尔积

Question

I have a collection of sets of elements where each element has a value (0..1) attached to it (actual container type doesn't matter). 我有一组元素集合，其中每个元素都有一个附加值（0..1）（实际的容器类型无关紧要）。 I'm iterating over the cartesian products, ie combinations of elements with one element taken from each set, something like this: 我正在遍历笛卡尔积，即元素的组合和从每个集合中选取的一个元素，如下所示：

import random
import itertools

stuff = [[random.random() for _ in range(random.randint(2,3))] for _ in range(2)]

for combo in itertools.product(*stuff):
    print sum(combo)  # yield in actual application

Easy enough, but I would like to get combinations with higher summed value first. 足够容易，但是我想首先获得具有更高总和的组合。 This doesn't need to be deterministic, it would be enough for me to have a significantly higher chance of getting a high-value combination before a low-value one. 这不必是确定性的，对我而言，拥有高价值组合的机会要比低价值组合的机会高得多。

Is there a clever way of doing this without creating all combinations first? 是否有一种聪明的方法可以在不先创建所有组合的情况下进行此操作？ Maybe by sorting/shifting the element-sets in a certain way? 也许通过以某种方式对元素集进行排序/移动？

Answer 1

There is indeed a better way to do this, by first sorting the collections in descending order, and then iterating such that we select the initial elements of each collection first. 确实存在更好的方法，首先以降序对集合进行排序，然后进行迭代，以便我们首先选择每个集合的初始元素。 Since they were sorted, this ensures we generally get high-value combinations first. 由于已对它们进行了排序，因此可以确保我们通常首先获得高价值的组合。

Let us build our intuition in steps, plotting the results along the way. 让我们逐步建立自己的直觉，并一路绘制结果。 I have found this helps a great deal in understanding the method. 我发现这对理解方法很有帮助。

Current method 当前方法

First, your current method (edited lightly for clarity). 首先，您当前的方法（为清晰起见，对其进行了轻松编辑）。

import random
import itertools
import matplotlib.pyplot as plt

list1 = [random.random() for _ in range(50)]
list2 = [random.random() for _ in range(50)]

values = []

for combo in itertools.product(list1, list2):
    values.append(sum(combo))
    print(sum(combo))           # yield in actual application

plt.plot(values)
plt.show()

Resulting in, 导致，

That is just all over the place! 到处都是！ We can already do better by imposing some sorted structure. 通过施加某种排序的结构，我们已经可以做得更好。 Let us explore this next. 接下来让我们探讨一下。

Pre-sorting the lists 对列表进行预排序

list1 = [random.random() for _ in range(50)]
list2 = [random.random() for _ in range(50)]

list1.sort(reverse=True)
list2.sort(reverse=True)

for combo in itertools.product(list1, list2):
    print(sum(combo))           # yield in actual application

Which yields, 哪个产生，

Look at the structure of that beauty! 看那美丽的结构！ Can we exploit this to yield the largest elements first? 我们可以利用它首先产生最大的元素吗？

Exploiting the structure 开发结构

For this part, we will have to let go of itertools.product , as it is too general for our tastes. 对于这一部分，我们将不得不放弃itertools.product ，因为它太笼统了。 A similar function is easily written, and we can exploit the regularity of our data when we do so. 可以轻松编写类似的函数，并且这样做时我们可以利用数据的规律性。 What do we know of the peaks in figure 2? 我们对图2中的峰知道多少？ Well, since the data is sorted, they must all occur at lower indices. 好吧，由于数据已排序，因此它们必须全部出现在较低的索引处。 If we imagine the indices to our collections as some higher-dimensional space, what this means is that we need to prefer points close to the origin - at least initially. 如果我们将集合的索引想象为更高维度的空间，这意味着我们需要更喜欢靠近原点的点-至少在最初是这样。

The following 2-D figure supports our intuition, 以下二维图支持我们的直觉，

A graph-based walk through our matrix should suffice, making sure we move to a new element every time. 基于矩阵的基于图的遍历就足够了，确保每次都移至一个新元素。 Now, the implementation I will provide below does build-up a set of visited nodes, which is not what you want. 现在，我将在下面提供的实现确实建立了一组访问节点，而这并不是您想要的。 Luckily, all visited nodes not on the 'frontier' (the currently reachable but unvisited nodes) can be deleted, which should limit space complexity considerably. 幸运的是，可以删除不在“边界”上的所有访问节点（当前可访问但未访问的节点），这将大大限制空间的复杂性。 I leave it up to you to come up with a clever way to do so. 我留给你一个聪明的办法。

The code, 编码，

import random
import itertools
import heapq


def neighbours(node):       # see https://stackoverflow.com/a/45618158/4316405
    for relative_index in itertools.product((0, 1), repeat=len(node)):
        yield tuple(i + i_rel for i, i_rel
                    in zip(node, relative_index))


def product(*args):
    heap = [(0, tuple([0] * len(args)))]    # origin
    seen = set()

    while len(heap) != 0:                   # while not empty
        idx_sum, node = heapq.heappop(heap)

        for neighbour in neighbours(node):
            if neighbour in seen:
                continue

            if any(dim == len(arg) for dim, arg in zip(neighbour, args)):
                continue                    # should not go out-of-bounds

            heapq.heappush(heap, (sum(neighbour), neighbour))

            seen.add(neighbour)

            yield [arg[idx] for arg, idx in zip(args, neighbour)]


list1 = [random.random() for _ in range(50)]
list2 = [random.random() for _ in range(50)]

list1.sort(reverse=True)
list2.sort(reverse=True)

for combo in product(list1, list2):
    print(sum(combo))

The code walks along the frontier, each time selecting the index with the lowest index sum (a heuristic for 'closeness' to the origin). 该代码沿着边界移动，每次选择索引总和最低的索引（启发式“接近”原点）。 This works quite well, as the following figure shows, 如下图所示，这很好用：

Answer 2

Inspired by N. Wouda's answer I tried yet another approach. 受N. Wouda的回答启发，我尝试了另一种方法。 When testing their answer I noticed a pattern in the indices resembling n-ary encoding (here for 3 sets): 测试他们的答案时，我注意到索引中的模式类似于n元编码（此处为3组）：

...
(1,1,0)
(1,1,1)
(0,0,2)
(0,1,2)
(1,0,2) <- !
(1,1,2)
(0,2,0)
(0,2,1)
(1,2,0)
...

Notice that lower numbers increase before higher ones do. 请注意，较低的数字先于较高的数字增加。 So I replicated this pattern in code: 所以我在代码中复制了这种模式：

idx = np.zeros((len(args)), dtype=np.int)
while max(idx) < 50:  # TODO stop condition
    yield [arg[i] for arg,i in zip(args,idx)]

    low = np.min(idx)
    imin = np.argwhere(idx == low)
    inxt = np.argwhere(idx == low+1)

    idx[imin[:-1]] = 0  # everything to the left of imin[-1]
    idx[imin[-1]] += 1  # increase the last of the lowest indices
    idx[inxt[inxt > imin[-1]]] = 0  # everything to the right

I took some shortcuts since I was just testing; 自从我进行测试以来，我采取了一些捷径。 the results are not too bad. 结果还不错。 While in the beginning this function outperforms N. Wouda's solution, it becomes worse the longer it goes. 尽管此功能在一开始就胜过N. Wouda的解决方案，但随着时间的延长，它变得更糟。 I think the "index-wave" is shaped differently, resulting in higher noise for indices further away from the origin. 我认为“索引波”的形状有所不同，导致远离原点的索引会产生更高的噪声。

Interesting! 有趣！

Edit I thought this is quite interesting, so I visualized the way the indices are iterated over - JFYI :) 编辑我认为这很有趣，所以我直观地看到了索引的迭代方式-JFYI :)

Index wavefront N. Wouda 指数波前N.Wouda

Index wavefront from this answer 这个答案的指数波前

加权元素的笛卡尔积

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-08-16 11:19:40

Current method 当前方法

Pre-sorting the lists 对列表进行预排序

Exploiting the structure 开发结构

解决方案2
0 2018-08-17 12:50:12

加权元素的笛卡尔积

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-08-16 11:19:40

Current method 当前方法

Pre-sorting the lists 对列表进行预排序

Exploiting the structure 开发结构

解决方案2 0 2018-08-17 12:50:12

解决方案1
2 已采纳 2018-08-16 11:19:40

解决方案2
0 2018-08-17 12:50:12