简体   繁体   English

ThreadPoolExecutor 在所有线程实际完成之前完成

[英]ThreadPoolExecutor finishing before all threads are actually finished

I have 28 methods that are being run in a pool.我有 28 种方法正在池中运行。 A total of 28 threads are created from ThreadPoolExecutor, which is an Executor subclass that uses a pool of threads to execute calls asynchronously. ThreadPoolExecutor 总共创建了 28 个线程,ThreadPoolExecutor 是一个 Executor 子类,它使用线程池来异步执行调用。 During the thread execution, I am using Plotly to generate some charts.在线程执行期间,我使用 Plotly 来生成一些图表。 I have problems in terms of that the ThreadPoolExecutor finishes before all threads are actually finished.我遇到的问题是 ThreadPoolExecutor 在所有线程实际完成之前完成。 I am always having hit and miss with 4 charts (threads) that are not created (not finished).我总是遇到 4 个未创建(未完成)的图表(线程)。 This is my code:这是我的代码:

from concurrent.futures import ThreadPoolExecutor

pool = ThreadPoolExecutor(max_workers=len(methods))

for method in methods:
    pool.submit(method, commits, system_name, reset, chart_output)

pool.shutdown(wait=True)

The executed methods are looking like:执行的方法如下所示:

def commits_by_date(commits, system_name, reset, chart_output):
collection_name = "commits_by_date"
reset_db_data(reset, system_name, collection_name)
date_commits = retrieve_db_data(system_name, collection_name)

if len(date_commits) == 0:
    date_commits = commits.groupby('commit_date')[['sha']].count()
    date_commits = date_commits.rename(columns={'sha': 'commit_count'})
    date_commits.insert(0, "system_name", system_name)
    date_commits = date_commits.reset_index()
    save_df_to_db(date_commits, collection_name)

if chart_output:
    fig = go.Figure([go.Scatter(
        x=date_commits.commit_date,
        y=date_commits.commit_count,
        text=date_commits.commit_count,
        fill='tozeroy')])
    fig.update_layout(
        title='Commits by Date',
        yaxis_title='Commits Count')
    fig.write_html('commits_by_date.html', auto_open=True)

The answer is to use:答案是使用:

import time

for method in methods:
    pool.submit(method, commits, system_name, reset, chart_output)
    time.sleep(as_many_you_want)

It depends what method is doing.method正在做什么。 When using concurrency, shared mutable state must be avoided.使用并发时,必须避免共享可变 state。 The function you are trying to execute concurrently seems to access the plotly graph, which is a shared mutable state.您尝试同时执行的 function 似乎访问了 plotly 图,这是一个共享的可变 state。

To avoid problems, you should only make concurrent code that is reentrant and part of the code that mutate shared state should be executed synchronously.为避免出现问题,您应该只制作可重入的并发代码,并且应该同步执行改变共享 state 的部分代码。

One way to achieve this is to break down method in two functions: the first one do the heavy work you want to parallelize (and must be reentrant) and the second one plot the results synchronously.实现这一点的一种方法是将method分解为两个函数:第一个执行您想要并行化的繁重工作(并且必须是可重入的),第二个 plot 同步执行结果。

Here is an example of how you could achieve this with Python concurrent.futures module:以下是如何使用 Python concurrent.futures模块实现此目的的示例:

from concurrent.futures import ThreadPoolExecutor, as_completed

def heavy_work(arg):
  # Some heavy work...
  result = slow_function(arg)
  return result

def plot(result, figure):
  # Plot the result to a shared figure,
  # must be executed synchronously.
  figure.plot(result)

args = [...]  # List of arguments to `heavy_work`
figure = ...  # The shared figure

# Submit work to be executed concurrently
with ThreadPoolExecutor() as pool:
  futures = [pool.submit(heavy_work, arg) for arg in args]

# Serialize the calls to `plot`
for future in as_completed(futures):
  result = future.result()
  plot(result, figure)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM