简体   繁体   English

在 Python 中并行化列表理解

[英]Parallelizing a list comprehension in Python

someList = [x for x in someList if not isOlderThanXDays(x, XDays, DtToday)]

I have this line and the function isOlderThanXDays makes some API calls causing it to take a while.我有这条线,并且 function isOlderThanXDays 进行了一些 API 调用,导致它需要一段时间。 I would like to perform this using multi/parrellel processing in python.我想在 python 中使用多/并行处理来执行此操作。 The order in which the list is done doesn't matter (so asynchronous I think)列表完成的顺序无关紧要(我认为是异步的)

The function isOlderThanXDays essentially returns a boolean value and everything newer than is kept in the new list using List Comprehension. function isOlderThanXDays 本质上返回一个 boolean 值,并且所有比使用列表理解保存在新列表中的更新。

Edit: Params of function: So the XDays is for the user to pass in lets say 60 days.编辑:function 的参数:所以 XDays 是供用户传入的,比如说 60 天。 and DtToday is today's date (date time object). DtToday 是今天的日期(日期时间对象)。 Then I make API calls to see metaData of the file's modified date and return if it is older I return true otherwise false.然后我调用 API 来查看文件修改日期的元数据,如果它更旧,则返回 true,否则返回 false。

I am looking for something similar to the question below.我正在寻找类似于以下问题的内容。 The difference is this question for every list input there is an output, whereas mine is like filtering the list based on boolean value from the function used, so I don't know how to apply it in my scenario不同的是这个问题对于每个列表输入都有一个 output,而我的就像根据 function 中的 boolean 值过滤列表,所以我不知道如何将它应用到我的场景中

How to parallelize list-comprehension calculations in Python? 如何并行化 Python 中的列表理解计算?

This should run all of your checks in parallel, and then filter out the ones that failed the check.这应该并行运行所有检查,然后过滤掉未通过检查的检查。

import multiprocessing

try:
    cpus = multiprocessing.cpu_count()
except NotImplementedError:
    cpus = 2   # arbitrary default


def MyFilterFunction(x):
    if not isOlderThanXDays(x, XDays, DtToday):
        return x
    return None

pool = multiprocessing.Pool(processes=cpus)
parallelized = pool.map(MyFilterFunction, someList)
newList = [x for x in parallelized if x]

you can use ThreadPool:你可以使用线程池:

from multiprocessing.pool import ThreadPool # Class which supports an async version of applying functions to arguments
from functools import partial

NUMBER_CALLS_SAME_TIME = 10 # take care to avoid throttling
# Asume that isOlderThanXDays signature is isOlderThanXDays(x, XDays, DtToday)
my_api_call_func = partial(isOlderThanXDays, XDays=XDays, DtToday=DtToday)
pool = ThreadPool(NUMBER_CALLS_SAME_TIME)
responses = pool.map(my_api_call_func, someList)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM