简体   繁体   English

如何在二维列表中查找与特定模式匹配的元素

[英]How to find elements that match a specific pattern in a 2d list

I would like to find an efficient way to retrieve all the elements in an array that match a specific pattern.我想找到一种有效的方法来检索匹配特定模式的数组中的所有元素。

For example, considering that I have:例如,考虑到我有:

  • an array M composed of sub-arrays of different sizes:由不同大小的子数组组成的数组M

     M = [[0, 1], [3, 2, 4], [3, 8], [9], [0, 2], [3, 1], [0, 3], [2, 4], [3, 7]]
  • A pattern of subarrays.子数组的模式。 For example, [[a, b], [a, c], [a, d]] matches [[0, 1], [0, 2], [0, 3]] .例如, [[a, b], [a, c], [a, d]]匹配[[0, 1], [0, 2], [0, 3]]

How can I return all the elements of M that correspond to the pattern?如何返回与模式对应的M所有元素?

So far I have been using for loops to find matching elements but this naive approach turns out to be very costly when the pattern has more than 2 sub-arrays.到目前为止,我一直在使用for循环来查找匹配元素,但是当模式具有 2 个以上的子数组时,这种幼稚的方法变得非常昂贵。

Example:例子:

M = [[0, 1], [3, 2, 4], [3, 8], [9], [0, 2], [3, 1], [0, 3], [2, 4], [3, 7]]

# pattern with 3 sub-arrays -> [[a, b], [a, c], [a, d]]

for i, arr1 in enumerate(M):
    for j, arr2 in enumerate(M):
        for k, arr3 in enumerate(M):
            if i != j != k:
                if len(arr1) == len(arr2) == len(arr3) == 2:
                    a1, a2, a3 = arr1[0], arr2[0], arr3[0]
                    b, c, d = arr1[1], arr2[1], arr3[1]
                    if a1 == a2 == a3 and b < c < d:
                        print arr1, arr2, arr3

Output:输出:

[0,1], [0,2], [0,3]
[3,1], [3,7], [3,8]

Since each sub-array accounts for an additional nested loop, the time complexity of this method ( O(n^k) where k is the number of sub-arrays), becomes an issue.由于每个子数组都包含一个额外的嵌套循环,因此该方法的时间复杂度( O(n^k)其中k是子数组的数量)成为一个问题。

Is it possible to speed up this process?有没有可能加快这个过程? If so, how?如果是这样,如何?

First, before jumping into numpy, let's take a look at your conditions.首先,在进入 numpy 之前,让我们来看看你的条件。 You require the sub-arrays to have only two elements.您要求子数组只有两个元素。 So let's pre-filter your array:所以让我们预先过滤你的数组:

M = [m for m in M if len(m) == 2]

Now you are checking a1 == a2 == a3 and b < c < d , but each possible permutation of b , c , d shows up in the sequence.现在您正在检查a1 == a2 == a3 and b < c < d ,但是bcd每个可能排列都出现在序列中。 So really, if you find any b != c != d for a given a , you can rearrange it to the right order, knowing that that order will show up eventually.所以真的,如果你找到任何b != c != d对于给定的a ,你可以将它重新排列为正确的顺序,知道该顺序最终会出现。

A very simple way to handle this is therefore to construct a dictionary mapping a to all possible options for b , c , d , filter them for a minimum of the number of "subarrays" you want, sort them, and compute all the possible combinations:因此,处理此问题的一个非常简单的方法是构建一个字典,将a映射到bcd所有可能选项,过滤它们以获得所需的最少“子数组”数量,对它们进行排序,并计算所有可能的组合:

# set removed duplicates automatically
options = collections.defaultdict(set)

for a, b in (m for m in M if len(m) == 2):  # Use a generator to filter on-the-fly
    options[a].add(b)

for a, bcd in options.items():
    # sort (combinations automatically filters too-short bins)
    for b, c, d in itertools.combinations(sorted(bcd), 3):
        print(f'[{a}, {b}], [{a}, {c}], [{a}, {d}]')

This solution is likely algorithmically optimal.该解决方案可能是算法优化的。 It makes a single pass over the initial list to identify potential patterns, and then performs exactly one iteration per pattern.它对初始列表进行一次遍历以识别潜在模式,然后对每个模式执行一次迭代。 The only thing that is potentially missing here is that duplicates are eliminated entirely.这里唯一可能缺少的是完全消除了重复项。 You can handle duplicates by using collections.Counter instead of set .您可以使用collections.Counter而不是set来处理重复项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM