[英]Script using multiprocessing with partial and map failing on Python > 3, working fine on 2.7, cannot pickle '_thread.lock'
I used the following code until today on Python 2.7 to parallelize the creation of many PNG pictures with matplotlib
.直到今天,我在 Python 2.7 上使用以下代码来并行化使用
matplotlib
创建许多 PNG 图片。 Today I tried to move everything on Python 3.8 and the part that I cannot adapt involves the parallelizatio done with multiprocessing
.今天,我尝试在 Python 3.8 上移动所有内容,而我无法适应的部分涉及使用
multiprocessing
完成的并行化。
The idea is that I have a script which needs to produce several images with similar settings from different timesteps of a data file.这个想法是我有一个脚本,它需要从数据文件的不同时间步长生成多个具有相似设置的图像。 As the plotting routine can be parametrized I'm executing it over chunks of 10 timesteps distributed among different tasks to speed up the process.
由于绘图例程可以参数化,我在分布在不同任务中的 10 个时间步长的块上执行它以加快过程。
Here is the relevant part of the script which I'm not going to paste given its length.这是脚本的相关部分,鉴于其长度,我不会粘贴。
from multiprocessing import Pool
from functools import partial
def main():
# arguments to be passed to the plotting functions
# contain data and information about the plot
args = dict(m=m, x=x, y=y, ax=ax,
winds_10m=winds_10m, mslp=mslp, ....)
# chunks of timesteps
dates = chunks(time, 10)
# partial version of the function plot_files(), see underneath
plot_files_param = partial(plot_files, **args)
p = Pool(8)
p.map(plot_files_param, dates)
def plot_files(dates, **args):
first = True
for date in dates:
#loop over dates, retrieve data from args, e.g. args['mslp'] and do the plotting
if __name__ == "__main__":
import time
start_time = time.time()
main()
elapsed_time=time.time()-start_time
print_message("script took " + time.strftime("%H:%M:%S", time.gmtime(elapsed_time)))
This used to work fine on Python 2.7 but now I get this error这曾经在 Python 2.7 上运行良好,但现在我收到此错误
Traceback (most recent call last):
File "plot_winds10m.py", line 135, in <module>
main()
File "plot_winds10m.py", line 79, in main
p.map(plot_files_param, dates)
File "lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
File "lib/python3.8/multiprocessing/pool.py", line 537, in _handle_tasks
put(task)
File "lib/python3.8/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot pickle '_thread.lock' object
the only thing that changed, besides the Python version and the packages versions, is the system.除了 Python 版本和软件包版本之外,唯一改变的是系统。 I'm testing this on MacOS instead than Linux, but it should not make a big difference especially since this is all running inside a conda environment.
我正在 MacOS 上而不是在 Linux 上对此进行测试,但它应该不会有太大的不同,尤其是因为这都是在 conda 环境中运行的。
Does anyone have an idea on how to fix this?有没有人知道如何解决这个问题?
(here is the link to the github repo https://github.com/guidocioni/icon_forecasts/blob/master/plotting/plot_winds10m.py ) (这里是 github 存储库的链接https://github.com/guidocioni/icon_forecasts/blob/master/plotting/plot_winds10m.py )
I figured out the problem in case anyone arrives here desperate for an answer.我想出了这个问题,以防有人急切地来到这里寻求答案。
The problem is that some of the conversion that I was doing using metpy.unit_array
produce a pint
array which for some reason is not pickable
.的问题是,一些,我用做转换
metpy.unit_array
产生pint
阵列,其由于某些原因不是pickable
。 When I was then passing this array in the args
of the partial
function I was getting the error.当我在
partial
函数的args
中传递这个数组时,我收到了错误。
Trying instead to do the conversion with .convert_units()
or just extracting the array part from the data (either with .values
or .magnitude
) ensured that I was passing only a numpy
array or a DataArray
and these object are pickable.尝试使用
.convert_units()
进行转换或仅从数据中提取数组部分(使用.values
或.magnitude
)确保我只传递一个numpy
数组或一个DataArray
并且这些对象是可选的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.