优化大集合的交集

Question

The premise is simple: I have two integers, a and b and I want to find i st a + i and b + i are both in a given list. 前提很简单：我有两个整数， a和b ，我想找到i a + i和b + i都在给定的列表中。 The list rs is very large (10e9 items). 列表rs非常大（10e9项）。 I have the following code: 我有以下代码：

def getlist(a,b):
    a1 = set([i - a for i in rs if i>a])
    b1 = set([i-b for i in rs if i>b]) 

    tomp = list(a1.intersection(b1))
    return tomp

The issue at hand is that a1 and b1 are pre-computed first which creates a memory problem. 手头的问题是首先预先计算a1和b1，这会产生内存问题。 Can I optimize my code somehow? 我能以某种方式优化我的代码吗？ General comments about the method are also welcome. 关于该方法的一般评论也是受欢迎的。

Example input: 输入示例：

rs = [4,9,16]
a = 3
b = 8

Expected output: 预期产量：

getlist(3,8) = [1]

Answer 1

You can optimize the memory usage by skipping the creation of the second set (and intermediate lists): 您可以通过跳过第二组（和中间列表）的创建来优化内存使用量：

def getlist(a, b):
    a1 = {i - a for i in rs if i > a}
    return [i - b for i in rs if i > b and i - b in a1]

The time and space complexity of this solution is O(n) . 该解决方案的时间和空间复杂度为O(n) 。

Answer 2

If rs is already a set , this would be faster: 如果rs已经是一个set ，这会更快：

def getlist(a, b):
    return [i - a for i in rs if i > a and b + (i - a) in rs]

If it is not, then you have to make the set first (otherwise the above algorithm would be very slow) and the performance is essentially the same as before: 如果不是，则必须先设置该设置（否则上述算法将非常慢）并且性能与以前基本相同：

def getlist(a, b):
    rs_set = set(rs)
    return [i - a for i in rs_set if i > a and b + (i - a) in rs_set]

However, if you are going to use the same function many times for different a and b values but the same rs , you can convert rs to a set once and reuse it every time. 但是，如果要对不同的a和b值使用相同的函数多次但是相同的rs ，则可以将rs转换为一次，并且每次都重复使用它。

优化大集合的交集

问题描述

2 个解决方案

解决方案1
4 2019-02-06 14:46:28

解决方案2
2 已采纳 2019-02-06 15:14:12

优化大集合的交集

问题描述

2 个解决方案

解决方案1 4 2019-02-06 14:46:28

解决方案2 2 已采纳 2019-02-06 15:14:12

解决方案1
4 2019-02-06 14:46:28

解决方案2
2 已采纳 2019-02-06 15:14:12