[英]Creating plots with multiprocessing and time.strftime() doens't work properly
I am trying to create plots with my script running parallel using multiprocessing.我正在尝试使用多处理并行运行我的脚本来创建绘图。 I created 2 example scripts for my question here, because the actual main script with the computing part would be too long.
我在这里为我的问题创建了 2 个示例脚本,因为带有计算部分的实际主脚本会太长。 In script0.py you can see the multiprocessing part where im starting the actual script1.py that does something 4 times in parallel.
在 script0.py 中,您可以看到我启动实际 script1.py 的多处理部分,该部分并行执行 4 次。 In this example it just creates some random scatterplots.
在这个例子中,它只是创建了一些随机散点图。
script0.py:脚本0.py:
import multiprocessing as mp
import os
def execute(process):
os.system(f"python {process}")
if __name__ == "__main__":
proc_num = 4
process= []
for _ in range(proc_num):
process.append("script1.py")
process_pool = mp.Pool(processes= proc_num)
process_pool.map(execute, process)
script1.py:脚本1.py:
#just a random scatterplot, but works for my example
import time
import numpy as np
import matplotlib.pyplot as plt
import os
dir_name = "stackoverflow_question"
plot_name = time.strftime("Plot %Hh%Mm%Ss") #note the time.strftime() function
if not os.path.exists(f"{dir_name}"):
os.mkdir(f"{dir_name}")
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2
plt.scatter(x,y, s=area, c=colors, alpha=0.5)
#plt.show()
plt.savefig(f"{dir_name}/{plot_name}", dpi = 300)
The important thing is, that I am naming the plot by its creation time重要的是,我按其创建时间命名 plot
plot_name = time.strftime("Plot %Hh%Mm%Ss")
plot_name = time.strftime("绘图 %Hh%Mm%Ss")
So this creates a string like "Plot 16h39m22s".所以这会创建一个类似“Plot 16h39m22s”的字符串。 So far so good... now to my actual problem, I realized that when starting the processes in parallel.
到目前为止一切顺利......现在到我的实际问题,我意识到在并行启动进程时。 sometimes the plot names are the same because the time stamps created by time.strftime() are the same and so it can happen that one instance of script1.py overwrites the already created plot of another.
有时 plot 名称相同,因为 time.strftime() 创建的时间戳相同,因此可能会发生 script1.py 的一个实例覆盖另一个已创建的 plot 的情况。
In my working script where I have this exact problem I'm generating a lot of data therefore i need to name my plots and CSVs accordingly to the date and time they were generated.在我遇到这个确切问题的工作脚本中,我生成了大量数据,因此我需要根据它们的生成日期和时间来命名我的图和 CSV。
I already thought of giving a variable down to script1.py when it gets called, but I don't know how to realize that since I just learned about the multiprocessing library.我已经想过在 script1.py 被调用时给它一个变量,但我不知道如何实现这一点,因为我刚刚了解了多处理库。 But this variable had to vary as well, otherwise I would run into the same problem.
但是这个变量也必须改变,否则我会遇到同样的问题。
Does anybody have a better idea of how I could realize this?有人对我如何实现这一点有更好的了解吗? Thank you so much in advance.
非常感谢你。
I propose these approaches:我提出这些方法:
Welcome to the site.欢迎来到本站。 A couple ideas...
几个想法...
First, you are not following the guidelines in multiprocessing
module on how to use Pool
.首先,您没有遵循
multiprocessing
模块中关于如何使用Pool
的指南。 You should have it in a context manager, with(...)...
您应该在上下文管理器中
with(...)...
There are many examples out there.那里有很多例子。 See the warning in red in the dox:
请参阅 dox 中的红色警告:
https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool
Also, using os.system
calls is a little odd/unsafe.此外,使用
os.system
调用有点奇怪/不安全。 Why don't you just put you plotting routine into a standard function in the same module or a different module and just import it?为什么不将绘图例程放入同一模块或不同模块中的标准 function 中并导入它? That would allow you to pass in additional info (like a good label) to the function.
这将允许您向 function 传递附加信息(如一个好的标签)。 I would expect something like this where
source
is a datafile or external source...我希望像这样的东西,其中
source
是数据文件或外部源......
def make_plot(source, output_file_name, plot_label):
# read the data source
# make the plot
# save it to the output path...
As far as the label is concerned, of course there is going to be overlap if you start these processes within the same "second", so you can either append the label with the process number, or some other piece of info like something from the data source, or use the same timestamp, but put the output in unique folders, as suggested in the other answer.就 label 而言,如果您在同一个“秒”内启动这些进程,当然会有重叠,因此您可以选择 append label 或类似进程号的其他信息数据源,或使用相同的时间戳,但将 output 放在唯一的文件夹中,如另一个答案中所建议的那样。
I would think something like this...我会想这样的事情......
from multiprocessing import Pool
import time
def f(data, output_folder, label):
# here data is just an integer, in yours, it would be the source of the graph data...
val = data * data
# the below is just example... you could just use your folder making/saving routine...
return f'now we can save {label} in folder {output_folder} with value: {val}'
if __name__ == '__main__':
with Pool(5) as p:
folders = ['data1', 'data2', 'data3']
labels = [time.strftime("Plot %Hh%Mm%Ss")]*3
x_s = [1, 2, 3]
output = p.starmap(f, zip(x_s, folders, labels))
for result in output:
print(result)
now we can save Plot 08h55m17s in folder data1 with value: 1
now we can save Plot 08h55m17s in folder data2 with value: 4
now we can save Plot 08h55m17s in folder data3 with value: 9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.