简体   繁体   English

线程未与 ThreadPoolExecutor 并行执行 python

[英]Threads is not executing in parallel python with ThreadPoolExecutor

I'm new in python threading and I'm experimenting this: When I run something in threads (whenever I print outputs), it never seems to be running in parallel.我是 python 线程的新手,我正在试验这个:当我在线程中运行某些东西时(每当我打印输出时),它似乎永远不会并行运行。 Also, my functions take the same time that before using the library concurrent.futures (ThreadPoolExecutor).此外,我的函数与使用库 concurrent.futures (ThreadPoolExecutor) 之前的时间相同。 I have to calculate the gains of some attributes over a dataset (I cannot use libraries).我必须计算数据集上某些属性的增益(我不能使用库)。 Since I have about 1024 attributes and the function was taking about a minute to execute (and I have to use it in a for iteration) I dicided to split the array of attributes into 10 (just as an example) and run the separete function gain(attribute) separetly for each sub array.由于我有大约 1024 个属性,并且 function 需要大约一分钟来执行(并且我必须在 for 迭代中使用它)我决定将attributes数组分成 10 个(仅作为示例)并运行单独的 function gain(attribute)分别为每个子数组。 So I did the following (avoiding some extra unnecessary code):所以我做了以下事情(避免一些额外的不必要的代码):

def calculate_gains(self):
    splited_attributes = np.array_split(self.attributes, 10)
    result = {}
    for atts in splited_attributes:
        with concurrent.futures.ThreadPoolExecutor() as executor:
            future = executor.submit(self.calculate_gains_helper, atts)
            return_value = future.result()
            self.gains = {**self.gains, **return_value}

Here's the calculate_gains_helper:这是calculate_gains_helper:

def calculate_gains_helper(self, attributes):
    inter_result = {}
    for attribute in attributes:
        inter_result[attribute] = self.gain(attribute)
    return inter_result

Am I doing something wrong?难道我做错了什么? I read some other older posts but I couldn't get any info.我阅读了其他一些较旧的帖子,但我无法获得任何信息。 Thanks a lot for any help!非常感谢您的帮助!

Python threads do not run in parallel (at least in CPython implementation) because of the GIL .由于GIL ,Python 线程不会并行运行(至少在 CPython 实现中)。 Use processes and ProcessPoolExecutor to really have parallelism使用进程和ProcessPoolExecutor真正具有并行性

with concurrent.futures.ProcessPoolExecutor() as executor:
    ...

You submit and then wait for each work item serially so all the threads do is slow everything down.您提交然后依次等待每个工作项,因此所有线程都会减慢一切。 I can't guarantee this will speed things up much because you are still dealing with the python GIL that keeps python level stuff from working in parallel, but here goes.我不能保证这会大大加快速度,因为您仍在处理 python GIL,它使 python 级别的东西无法并行工作,但这里有。

I've created a thread pool and pushed everything possible into the worker, including the slicing of self.attributes .我创建了一个线程池,并将所有可能的东西都推送到了 worker 中,包括self.attributes的切片。

def calculate_gains(self):
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        result_list = executor.map(self.calculate_gains_helper,
            ((i, i+10) for i in range(0, len(self.attributes), 10)))
    for return_value in result_list:
        self.gains = {**self.gains, **return_value}

def calculate_gains_helper(self, start_end):
    start, end = start_end
    inter_result = {}
    for attribute in self.attributes[start:end]:
        inter_result[attribute] = self.gain(attribute)
    return inter_result

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM