[英]How to match two list with the same length
i have two list with the same length.我有两个长度相同的列表。 i would like to find a match in df and df2.
我想在 df 和 df2 中找到匹配项。
df = [[[1, 5,7,9,12,13,17],
[2,17,18,23,32,34,45],
[3,5,11,33,34,36,45]],
[[6,21,22,50,56,58,72],
[7,5,12,13,55,56,74],
[8,23,24,32,56,58,64]]]
df2 = [[[100,5,12,15,27,32,54],
[120,10,17,18,19,43,55],
[99,21,32,33,34,36,54]],
[[41,16,32,45,66,67,76],
[56,10,11,43,54,55,56],
[77,12,16,18,19,21,23]]]
i would like my output to be like this or similar.我希望我的 output 是这样的或类似的。
output = [[[[5,12,],[17]],
[[17,18],[32,34,36]]],
[[[55,56],[32]],[[56]]]
So this problem is difficult because you in order to compare the arrays you need to compute all possible sequences that could occur first.所以这个问题很难,因为你为了比较 arrays 你需要计算所有可能首先出现的序列。 So when you are dealing with large arrays, this tasks will take very long.
所以当你处理大 arrays 时,这个任务会花费很长时间。
My approach is not optimized in any way for simplicity, I tried to stick to basic for loops and just brute forced through it.为了简单起见,我的方法没有以任何方式进行优化,我试图坚持使用基本的 for 循环,只是强行通过它。
At first, I will compute all possible combinations for both arrays and store them in two lists ( sequences_1
and sequences_2
).首先,我将计算 arrays 的所有可能组合并将它们存储在两个列表(
sequences_1
和sequences_2
)中。
Afterward, I will compare all the sequences and store the matches in a set.之后,我将比较所有序列并将匹配项存储在一个集合中。 Using a set will automatically get rid of all duplicates.
使用集合会自动删除所有重复项。
Matches
now holds all possible matches of the two arrays. Matches
现在包含两个 arrays 的所有可能匹配项。
df = [
[[1, 5, 7, 9, 12, 13, 17], [2, 17, 18, 23, 32, 34, 45], [3, 5, 11, 33, 34, 36, 45]],
[[6, 21, 22, 50, 56, 58, 72], [7, 5, 12, 13, 55, 56, 74], [8, 23, 24, 32, 56, 58, 64]],
]
df2 = [
[[100, 5, 12, 15, 27, 32, 54], [120, 10, 17, 18, 19, 43, 55], [99, 21, 32, 33, 34, 36, 54]],
[[41, 16, 32, 45, 66, 67, 76], [56, 10, 11, 43, 54, 55, 56], [77, 12, 16, 18, 19, 21, 23]],
]
sequences_1 = []
for el in df:
for lis in el:
length = len(lis)
for i in range(0, length):
for n in range(i + 1, length + 1):
sequences_1.append(lis[i:n])
sequences_2 = []
for el in df2:
for lis in el:
length = len(lis)
for i in range(0, length):
for n in range(i + 1, length + 1):
sequences_2.append(lis[i:n])
matches = set()
for seq_1 in sequences_1:
for seq_2 in sequences_2:
if seq_1 == seq_2:
matches.add(tuple(seq_1))
print(matches)
As matches, I get:作为比赛,我得到:
{(5,), (11,), (17,), (23,), (17, 18), (32,), (55, 56), (56,), (5, 12), (34, 36), (34,), (33, 34, 36), (55,), (33, 34), (12,), (18,), (21,), (33,), (36,), (45,)}
This holds all the matches you presented as a possible output, plus additional ones you missed when posting your question.这包含您作为可能的 output 提供的所有匹配项,以及您在发布问题时错过的其他匹配项。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.