将列表中的每个元素与另一个列表中的 2 个元素进行比较并使代码高效

Question

我有 2 个列表：

threshold=[0.123,0.108,0.102,0.087]
retention=[0.19,1,0.57,5,0.09]

我想找出每个retention元素是否在threshold列表内

我的代码在这里说明：

ca2=[(b>retention[0]>a) for b,a in zip(threshold[::1],threshold[1::1])]
ca3=[(b>retention[1]>a) for b,a in zip(threshold[::1],threshold[1::1])]
ca4=[(b>retention[2]>a) for b,a in zip(threshold[::1],threshold[1::1])]
ca5=[(b>retention[3]>a) for b,a in zip(threshold[::1],threshold[1::1])]
ca6=[(b>retention[4]>a) for b,a in zip(threshold[::1],threshold[1::1])]

正如您所看到的，它要求retention[0]是否在threshold哪个元素之间

我需要比较retention每个元素。 我的代码有效，但它是多余的，我认为效率不高。 我希望保留也自动与threshold内的 2 个其他元素进行比较。 如果您能指导我或帮助提高代码效率，我将不胜感激，因为保留列表可能会更长。

Answer 1

您可以生成一个以保留值作为键和阈值比较列表作为值的字典。 此外，如果将 zip 对象强制转换为列表，则不需要每次迭代都创建 zip 对象。

t = list(zip(threshold, threshold[1:]))
print({i: [(b > i > a) for b, a in t] for i in retention})

Answer 2

要检查每个保留元素是否在阈值的两个元素之间，可以使用 bisect（这是每次检查的 log(n) 时间。

代码

from bisect import bisect_left

def binary_search(a, x): 
    """Index of where x would be inserted into a
       return None if x < min(a) or x > max(a)"""
    i = bisect_left(a, x)
    return i if i != len(a) and i > 0 else None

threshold = [0.123,0.108,0.102,0.087]
threshold_asc = threshold[::-1]
retention = [0.123, 0.19,1,0.57,5,0.09, 0.087]

for r in retention:
  print(f'{r} ->> {binary_search(threshold_asc, r)}')

输出

0.123 ->> 3
0.19 ->> None
1 ->> None
0.57 ->> None
5 ->> None
0.09 ->> 1
0.087 ->> None

复杂

O(log(N)) 对于每次保留检查。 这比遍历阈值列表以找到 O(N) 的周围值对更有效。

Answer 3

不完全确定您要实现的目标，但您可以使用bisect在阈值列表中进行二分搜索，以找到刚好低于给定数字的阈值。

retention = [0.19, 1, 0.57, 5, 0.09]
threshold = [0.123, 0.108, 0.102, 0.087]
threshold = [0] + sorted(threshold) # add 0 and sort
bins = {t: [] for t in threshold}
for r in retention:
    k = bisect.bisect(threshold, r) # actually, this is the next threshold
    bins[threshold[k-1]].append(r)  # thus k-1 here to get the lower one
# {0: [], 0.087: [0.09], 0.102: [], 0.108: [], 0.123: [0.19, 1, 0.57, 5]}

与另一个bisect答案（产生非常不同的输出）一样，每个查询的复杂度为 O(logn)，n 是阈值的数量，对于retention k 个元素，总共为 O(klogn)。

Answer 4

如果您使用 numpy，您可以查看numpy.searchsorted函数，类似于提到的bisect其他函数。

np.searchsorted(sorted(thresholds), retentions)

将为您提供保留值应放在排序阈值中的位置的索引。

Answer 5

我可以看到另外两种方法：

如果threshold已排序，并且您想知道是否存在j使得threshold[j] > retention[i] >= threshold[j+1] （注意>= ）但不需要j的值，则您只需要检查threshold[0] > retention[i] >= threshold[-1] 。 如果您需要retention[i]严格位于来自threshold两个连续值之间，这将不起作用。 这为您提供了每个元素的O(1)检查，因此，以及O(n)算法。
如果retention已排序，您可以为此使用双指针方法。 例如，假设retention按降序排列。 将retention[i]与threshold[0]和threshold[1] 。 如果retention[i] < threshold[1] ，则增加i ，即移动到retention的下一个值。 如果没有，您现在将retention[i]与threshold[1]和threshold[2] 。 等等。 每个元素的检查在这里摊销为O(1) ，因此，除非您需要先对retention进行排序，否则这也将是O(n)算法。

将列表中的每个元素与另一个列表中的 2 个元素进行比较并使代码高效

问题描述

5 个解决方案

解决方案1
1 2020-03-13 13:32:57

解决方案2
1 已采纳 2020-03-13 13:36:04

解决方案3
1 2020-03-13 13:51:29

解决方案4
0 2020-03-28 01:34:26

解决方案5
0 2020-03-28 07:23:03

将列表中的每个元素与另一个列表中的 2 个元素进行比较并使代码高效

问题描述

5 个解决方案

解决方案1 1 2020-03-13 13:32:57

解决方案2 1 已采纳 2020-03-13 13:36:04

解决方案3 1 2020-03-13 13:51:29

解决方案4 0 2020-03-28 01:34:26

解决方案5 0 2020-03-28 07:23:03

解决方案1
1 2020-03-13 13:32:57

解决方案2
1 已采纳 2020-03-13 13:36:04

解决方案3
1 2020-03-13 13:51:29

解决方案4
0 2020-03-28 01:34:26

解决方案5
0 2020-03-28 07:23:03