[英]fastest way to check if atleast one element in set/list is in each element in a collection of lists/sets
I have the following:我有以下内容:
list1 = {"a", "b", "c"}
list2 = [
{"a", "s", "d", "f"},
{"q", "w", "e", "c"},
{"v", "b", "n", "m"},
]
i now want to check that elements in list1 are somewhere in list2.我现在想检查 list1 中的元素是否在 list2 中的某个位置。 each element in list2 MUST contain one of the elements in list1.
list2 中的每个元素必须包含 list1 中的一个元素。
what i currently do is the following (also found it on stackoverflow a while ago):我目前所做的是以下内容(前一段时间也在 stackoverflow 上找到了它):
all(list1 & l for l in list2)
this is actually reasonably fast.这实际上相当快。 however I am now running into the issue that I have billions of different list1, so I have to come up with a faster solution.
但是我现在遇到了一个问题,我有数十亿个不同的 list1,所以我必须想出一个更快的解决方案。 I also tried numba, but I am struggling with nested lists, and sets are not supported.
我也尝试过 numba,但我在嵌套列表中苦苦挣扎,并且不支持集合。
I have a bunch of items (like the sets in list2) that can represent that sets.我有一堆可以代表该集合的项目(如 list2 中的集合)。 for example, the first set in list2 consists of "a", "s", "d" and "f".
例如,list2 中的第一个集合由“a”、“s”、“d”和“f”组成。 all of those characters "desribe" the first set in list2.
所有这些字符都“描述”了 list2 中的第一组。
what I now want to do is find the shortest combination to describe list2.我现在要做的是找到描述 list2 的最短组合。 for example:
例如:
list2 = [
{"a", "s", "d", "f"},
{"q", "w", "e", "c"},
{"v", "b", "n", "m"},
{"v", "l", "p", "o"},
]
here the shortest combination to describe list2 is a,q,v (a describes the first element, q the second and v elements 3 and 4)这里描述 list2 的最短组合是 a、q、v(a 描述第一个元素,q 描述第二个元素,v 描述元素 3 和 4)
the way i construct list1 would be to take我构造 list1 的方式是
U = set.union(*list2)
for list1 in itertools.combinations(U,3): #i loop over the combinations to find the minimum one, so combinations(U,2), combinations(U,3) ....
...
this works really well, even for very large numbers (100s of millions of combinations), however it is still somewhat limited.这非常有效,即使对于非常大的数字(数以百万计的组合)也是如此,但它仍然有些有限。 I would like to reduce it as much as I can.
我想尽可能地减少它。 edit: the datastructure for list2 is as desribed above, a collection of sets containing strings (in my case its 3 character combinations), and so list1 is also a set of strings.
编辑:list2 的数据结构如上所述,是一组包含字符串的集合(在我的例子中是 3 个字符的组合),因此 list1 也是一组字符串。
thanks谢谢
There is a simple optimization you can make,您可以进行一个简单的优化,
not any(map(list1.isdisjoint, list2))
isdisjoint
avoids needing to calculate the full result, and map
is faster than a comprehension when you are just calling a single method. isdisjoint
避免了计算完整结果的需要,并且map
在您仅调用单个方法时比理解更快。
However, if you want a more optimal result you have to give more detail about what you are trying to do.但是,如果您想要更优的结果,则必须提供有关您尝试执行的操作的更多详细信息。 Particularly, what are the sizes of all of the data structures, and what are the elements they contain?
特别是,所有数据结构的大小是多少,它们包含哪些元素?
what I now want to do is find the shortest combination to describe list2
我现在要做的是找到描述 list2 的最短组合
This is the Hitting Set Problem , which is well studied and for which there exist multiple solvers, like this one .这是Hitting Set Problem ,它得到了很好的研究,并且存在多个求解器,例如这个。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.