[英]Optimize intersection of large sets
The premise is simple: I have two integers, a
and b
and I want to find i
st a + i
and b + i
are both in a given list. 前提很简单:我有两个整数,
a
和b
,我想找到i
a + i
和b + i
都在给定的列表中。 The list rs
is very large (10e9 items). 列表
rs
非常大(10e9项)。 I have the following code: 我有以下代码:
def getlist(a,b):
a1 = set([i - a for i in rs if i>a])
b1 = set([i-b for i in rs if i>b])
tomp = list(a1.intersection(b1))
return tomp
The issue at hand is that a1 and b1 are pre-computed first which creates a memory problem. 手头的问题是首先预先计算a1和b1,这会产生内存问题。 Can I optimize my code somehow?
我能以某种方式优化我的代码吗? General comments about the method are also welcome.
关于该方法的一般评论也是受欢迎的。
Example input: 输入示例:
rs = [4,9,16]
a = 3
b = 8
Expected output: 预期产量:
getlist(3,8) = [1]
You can optimize the memory usage by skipping the creation of the second set (and intermediate lists): 您可以通过跳过第二组(和中间列表)的创建来优化内存使用量:
def getlist(a, b):
a1 = {i - a for i in rs if i > a}
return [i - b for i in rs if i > b and i - b in a1]
The time and space complexity of this solution is O(n)
. 该解决方案的时间和空间复杂度为
O(n)
。
If rs
is already a set
, this would be faster: 如果
rs
已经是一个set
,这会更快:
def getlist(a, b):
return [i - a for i in rs if i > a and b + (i - a) in rs]
If it is not, then you have to make the set first (otherwise the above algorithm would be very slow) and the performance is essentially the same as before: 如果不是,则必须先设置该设置(否则上述算法将非常慢)并且性能与以前基本相同:
def getlist(a, b):
rs_set = set(rs)
return [i - a for i in rs_set if i > a and b + (i - a) in rs_set]
However, if you are going to use the same function many times for different a
and b
values but the same rs
, you can convert rs
to a set once and reuse it every time. 但是,如果要对不同的
a
和b
值使用相同的函数多次但是相同的rs
,则可以将rs
转换为一次,并且每次都重复使用它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.