當元組中有 None 值時，如何從列表中刪除元組

Question

我正在研究一個發出數千個請求的異步代碼。 每個請求都保存在一個帶有 id 和 response 的元組中，然后附加到任務列表中。

通常我會得到一個包含 4000 多個元組的列表

運行代碼后，我得到一個這樣的列表：

responses = [(00001, {"code": 0, "foo": "bar"}), (00002, {"code": 0, "foo": "bar"}), (00003, {"code": 0, "foo": "bar"}), (00004, None), (00005, None), (00006, {"code": 0, "foo": "bar"})]

因為我只需要具有 json 響應的那些，我想刪除第二個索引為 None 的所有元組

我已經在列表中與 append 進行了交互，到一個新列表中只有“有效”元組，沒有 None 值的，但它不是那么高性能。

有沒有一種方法可以刪除這些帶有 None 的元組，而不必一個一個地交互？

Answer 1

TL;DR：列表理解表現最好。

然后內置filter和multiprocessing.pool是一個矯枉過正，即最壞的。

我在我的機器上測試了所有這些，python 3.10.2，output：

$ python main.py 
  288.01 mks in filter_LC([(1, {'code': 0, 'foo'...)  # List comprehension
  469.21 mks in filter_builtin([(1, {'code': 0, 'foo'...) # Builtin filter
   15.28 ms in filter_pool([(1, {'code': 0, 'foo'...) # Multiprocessing

測試代碼：

from multiprocessing import Pool  # use process
# from multiprocessing.dummy import Pool  # thread based Pool performs better than process but only slightly
from funcy import print_durations

responses = [(1, {"code": 0, "foo": "bar"}),
             (2, None),
             (3, {"code": 0, "foo": "bar"}),
             (4, None),
             (5, None),
             (6, {"code": 0, "foo": "bar"})] * 1000

@print_durations
def filter_LC(responses):
    return [c for c in responses if c[1] != None]

@print_durations
def filter_builtin(responses):
    return list(filter(lambda c: c[1] != None, responses))

# Helpers for filter_pool()

def valid(x):
    if len(x) < 2 or x[1] == None:
        return False
    return True

def pool_filter(pool, func, candidates):
    return [c for c, keep in zip(candidates, pool.map(func, candidates)) if keep]

@print_durations
def filter_pool(responses, pool_size=5):
    with Pool(pool_size) as p:
        return pool_filter(p, valid, responses)

if __name__ == "__main__":
    ans = [
        filter_LC(responses),
        filter_builtin(responses),
        filter_pool(responses),
    ]
    for a in ans:
        assert a == ans[0]

列表理解勝過內置filter 。 我猜filter可能會遭受 lambda 的開銷，而列表理解沒有。

並且線程/進程池是一種過度殺傷，最好將其保存用於更耗時的工作而不是過濾；）

參考：

pool_filter()片段來自How to use parallel processing filter in Python? - 堆棧溢出

Answer 2

您可以嘗試使用 Python 的filter() 。 例如，您可以這樣做：

valids = filter(lambda x: x[1] is not None, responses)

對於您的示例responses變量， valids將是

[(1, {'code': 0, 'foo': 'bar'}), 
 (2, {'code': 0, 'foo': 'bar'}), 
 (3, {'code': 0, 'foo': 'bar'}), 
 (6, {'code': 0, 'foo': 'bar'})]

順便說一句，在 Python 3 中，十進制 integer 文字上的前導零是不允許的。 所以此代碼適用於 Python 2.x。

現在這是否比列表理解更有效，我不能肯定地說，盡管一篇博客文章表明它可能不是。 在幕后，Python 可能仍在與元組一一交互，但至少這沒有反映在代碼的語義中。

Answer 3

過濾器 function 背后的想法很棒而且非常 Pythonic。 它只是缺乏性能，因為它不支持多處理，因為lambdas 在默認情況下是不可腌制的。

map reduce是其他替代品，請參閱此處的參考資料。 解決方案的一個例子是：

def validate_request(request):
    return True if request[1] is not None

requests = [r for r, valid in zip(requests, pool.map(validate_request, requests)) if valid]

當元組中有 None 值時，如何從列表中刪除元組

問題描述

3 個解決方案

解決方案1
1 2022-01-27 14:43:25

解決方案2
0 2022-01-27 13:42:52

解決方案3
0 2022-01-27 14:15:11

當元組中有 None 值時，如何從列表中刪除元組

問題描述

3 個解決方案

解決方案1 1 2022-01-27 14:43:25

解決方案2 0 2022-01-27 13:42:52

解決方案3 0 2022-01-27 14:15:11

解決方案1
1 2022-01-27 14:43:25

解決方案2
0 2022-01-27 13:42:52

解決方案3
0 2022-01-27 14:15:11