更高效的列表过滤方式

Question

（免责声明：不是一个好的程序员，所以如果我的语言不准确，我提前道歉。）

我有两个列表，我想识别第二个列表（“斜率”）中小于 -1 的所有项目，并且之前没有重复项。 然后我希望返回满足这两个 if 条件的第一个列表（计数器）中的相应项目。 （新列表：“filter_count”。）

这行代码完成了这项工作：

filter_count = [i for i,j in zip(counter, slope) if i-1 == slope.index( slope[int(i-1)] ) and j<=-1]

但是当我在我的每一行代码之间使用计时器 function 时，我看到整个脚本使用大约 13 秒来计算，但是这个特定的代码行使用了大约 99.9% 的时间，换句话说。 我假设我在这里写了一个效率极低的行，并且想知道我是否可以更轻松地完成它。

作为参考，这是一个关于 filter_count 应该如何与这些列表一起使用的示例（在我的真实示例中，它要长得多）

Answer 1

使用字典记录每个元素第一次出现的索引，然后遍历得到结果：

>>> mapping = {}
>>> for i, elem in zip(counter, slope):    # or enumerate(slope, 1)
...     mapping.setdefault(elem, i)
...
>>> [i for elem, i in mapping.items() if elem < -1]
[1, 3, 4, 5, 7, 10, 11, 13]

注意：这取决于字典确保 Python 3.7+ 中的插入顺序。 如果您使用的是较早的 Python 版本，请考虑使用collections.OrderedDict ，或同时使用set和list 。

Answer 2

使用set防止以前的重复

counter = list(range(1, 16))
slope = [-229, -229, -13, -67, -99.4, -13, -43.8, -67, -13, -34.6, -52.2, -13, -29.6, 2.4, -13]
s = set()
filter_count = []
for i, e in zip(counter, slope):
    if e not in s and e < 0:
        s.add(e)
        filter_count.append(i)

print(filter_count)

印刷：

[1, 3, 4, 5, 7, 10, 11, 13]

Answer 3

更新：（更大的测试列表）

问题是您正在对每个元素进行index搜索。 这意味着您的算法的复杂度是O(n ² ) 。

set和dict解决方案更好，因为它们权衡 memory 以获得速度。 这是一个用于比较性能的非常粗略的快速脚本：

import sys
from timeit import timeit


def generate_list(size: int) -> list:
    return list(range(-size//2, size//2))


def original_solution(counter: list, slope: list) -> list:
    filter_count = [i for i, j in zip(counter, slope) if i-1 == slope.index(slope[int(i-1)]) and j <= -1]
    return filter_count


def using_dictionary(counter: list, slope: list) -> list:
    """Inspired by Mechanic Pig"""
    mapping = {}
    for i, elem in zip(counter, slope):
        mapping.setdefault(elem, i)
    filter_count = [i for elem, i in mapping.items() if elem < -1]
    return filter_count


def using_set(counter: list, slope: list) -> list:
    """Inspired by Алексей Р"""
    s = set()
    filter_count = []
    for i, e in zip(counter, slope):
        if e not in s and e < 0:
            s.add(e)
            filter_count.append(i)
    return filter_count


def main():
    size, n = sys.argv[1:]
    print(f"List size: {size}, n: {n}")
    print("Time in seconds:")
    for func in [original_solution, using_dictionary, using_set]:
        name = func.__name__
        t = timeit(
            f'counter = slope = generate_list({size}); '
            f'{name}(counter, slope)',
            setup=f'from __main__ import generate_list, {name}',
            number=int(n)
        )
        print(f'{name:<18}', round(t, 4))


if __name__ == '__main__':
    main()

您可以调用提供所需列表大小和数量或重复的脚本。

python test_script.py 10000 10

结果：

List size: 10000, n: 10
Time in seconds:
original_solution  3.5511
using_dictionary   0.0122
using_set          0.0093

我推测set的解决方案稍微好一些，因为添加和查找的复杂度都是O(1)并且只有一个循环，使得整个算法O(n) 。

虽然dict解决方案在技术上也是O(n) ，但它必须执行两个循环，这意味着在实践中2 * n次迭代。 所以在极限情况下它与set的解决方案相同，但在实践中它可能总是会慢一些。

Answer 4

此解决方案不会给出完全相同的结果，但仍会返回每个唯一值的索引。 例如，在 1, 2 对的情况下，它返回 2 而不是 1，就像它在结果中所做的那样。 所以还是符合要求的。
如果您想要完全相同的结果，您需要创建一个临时字典（带有index: val ）并在创建slope_d （ val: index ）之前反转顺序。

# create list for testing
slope = [-229, -229, -13, -67, -99.4, -13, -43.85, -67, -13, -34.6, -52.27, -13, -29.61, 2.42, -13]

# transform to a dict with the values as keys
slope_d = {val: idx+1 for idx, val in enumerate(slope) if val <= -1}
# here lies the trick: since the keys of a dict are unique, 
# they will be overwritten if there's the same value present in the original list

# transform to list for output
filter_count = list(slope_d.values())
filter_count.sort()

print(filter_count)

结果：

[2, 5, 7, 8, 10, 11, 13, 15]

Answer 5

尝试这个

filter_count = [
    count for i, count in enumerate(counter)
    if slope[i]<-1 and count not in counter[:i]
]

更高效的列表过滤方式

问题描述

5 个解决方案

解决方案1
2 2022-09-01 08:14:51

解决方案2
2 2022-09-01 08:16:57

解决方案3
1 2022-09-01 08:36:14

解决方案4
0 2022-09-01 08:19:10

解决方案5
-1 2022-09-01 08:09:31

更高效的列表过滤方式

问题描述

5 个解决方案

解决方案1 2 2022-09-01 08:14:51

解决方案2 2 2022-09-01 08:16:57

解决方案3 1 2022-09-01 08:36:14

解决方案4 0 2022-09-01 08:19:10

解决方案5 -1 2022-09-01 08:09:31

解决方案1
2 2022-09-01 08:14:51

解决方案2
2 2022-09-01 08:16:57

解决方案3
1 2022-09-01 08:36:14

解决方案4
0 2022-09-01 08:19:10

解决方案5
-1 2022-09-01 08:09:31