简体   繁体   English

matplotlib savefig线程安全吗?

[英]Is matplotlib savefig threadsafe?

I have an in-house distributed computing library that we use all the time for parallel computing jobs. 我有一个内部分布式计算库,我们一直将其用于并行计算工作。 After the processes are partitioned, they run their data loading and computation steps and then finish with a "save" step. 在对进程进行分区之后,它们将运行其数据加载和计算步骤,然后以“保存”步骤结束。 Usually this involved writing data to database tables. 通常,这涉及将数据写入数据库表。

But for a specific task, I need the output of each process to be a .png file with some data plots. 但是对于特定任务,我需要将每个过程的输出转换为带有某些数据图的.png文件。 There are 95 processes in total, so 95 .pngs. 总共有95个进程,因此有95个.png。

Inside of my "save" step (executed on each process), I have some very simple code that makes a boxplot with matplotlib's boxplot function and some code that uses savefig to write it to a .png file that has a unique name based on the specific data used in that process. 在我的“保存”步骤(在每个进程中执行)的内部,我有一些非常简单的代码,它们使用matplotlib的boxplot函数制作了一个savefig ,还有一些代码使用savefig将其写入具有唯一名称的.png文件。该过程中使用的特定数据。

However, I occasionally see output where it appears that two or more sets of data were written into the same output file, despite the unique names. 但是,有时会看到输出,尽管有唯一的名称,但似乎有两组或更多组数据写入了同一输出文件。

Does matplotlib use temporary file saves when making boxplots or saving figures? 做箱形图或保存图形时,matplotlib是否使用临时文件保存? If so, does it always use the same temp file names (thus leading to over-write conflicts)? 如果是这样,它是否始终使用相同的临时文件名(从而导致覆盖冲突)? I have run my process using strace and cannot see anything that obviously looks like temp file writing from matplotlib. 我已经使用strace运行了进程,但看不到任何明显类似于从matplotlib写入临时文件的内容。

How can I ensure that this will be threadsafe? 我如何确保这将是线程安全的? I definitely want to conduct the file saving in parallel, as I am looking to expand the number of output .pngs considerably, so the option of first storing all the data and then just serially executing the plot/save portion is very undesirable. 我绝对希望并行执行文件保存,因为我希望显着扩展输出.png的数量,因此先存储所有数据然后仅串行执行绘图/保存部分的选项是非常不理想的。

It's impossible for me to reproduce the full parallel infrastructure we are using, but below is the function that gets called to create the plot handle, and then the function that gets called to save the plot. 对我来说,不可能重现我们正在使用的完整并行基础架构,但是下面是调用该函数以创建绘图句柄,然后调用该函数以保存绘图的方法。 You should assume for the sake of the question that the thread safety has nothing to do with our distributed library. 出于这个问题,您应该假定线程安全与我们的分布式库无关。 We know it's not coming from our code, which has been used for years for our multiprocessing jobs without threading issues like this (especially not for something we don't directly control, like any temp files from matplotlib). 我们知道它不是来自我们的代码,该代码已经用于我们的多处理工作多年,而没有出现类似这样的线程问题(特别是不是因为我们无法直接控制某些事情,例如来自matplotlib的任何临时文件)。

import pandas
import numpy as np
import matplotlib.pyplot as plt

def plot_category_data(betas, category_name):
    """
    Function to organize beta data by date into vectors and pass to box plot
    code for producing a single chart of multi-period box plots.
    """
    beta_vector_list = []
    yms = np.sort(betas.yearmonth.unique())
    for ym in yms:
        beta_vector_list.append(betas[betas.yearmonth==ym].Beta.values.flatten().tolist())
    ###

    plot_output = plt.boxplot(beta_vector_list)
    axs = plt.gcf().gca()
    axs.set_xticklabels(betas.FactorDate.unique(), rotation=40, horizontalalignment='right')
    axs.set_xlabel("Date")
    axs.set_ylabel("Beta")
    axs.set_title("%s Beta to BMI Global"%(category_name))
    axs.set_ylim((-1.0, 3.0))

    return plot_output
### End plot_category_data

def save(self):
    """
    Make calls to store the plot to the desired output file.
    """
    out_file = self.output_path + "%s.png"%(self.category_name)
    fig = plt.gcf()
    fig.set_figheight(6.5)
    fig.set_figwidth(10)
    fig.savefig(out_file, bbox_inches='tight', dpi=150)
    print "Finished and stored output file %s"%(out_file)
    return None
### End save

In your two functions, you're calling plt.gcf() . 在您的两个函数中,您正在调用plt.gcf() I would try generating a new figure every time you plot with plt.figure() and referencing that one explicitly so you skirt the whole issue entirely. 每次使用plt.figure()绘制时,我都会尝试生成一个新图,并显式引用该图,以便完全避开整个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM