简体   繁体   中英

What's the fastest way to generate millions of png files using Matplotlib?

For a deep learning project, I need to synthesize plots for each item in my dataset. This means generating 2.5 million plots, each 224x224 pixels.

So far the best I've been able to do is this, which takes 2.7 seconds to run on my PC:

from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
import matplotlib.pyplot as plt

for i in range(100):
    fig = plt.Figure(frameon=False, facecolor="white", figsize=(4, 4))
    ax = fig.add_subplot(111)
    ax.axis('off')
    ax.plot([1, 2, 3, 4, 5, 6, 7, 8], [2, 4, 6, 8, 8, 6, 4, 3])
    canvas = FigureCanvas(fig)
    canvas.print_figure(str(i), dpi=56)

A resulting image (from this reproducible example) looks like this:

在此处输入图像描述

The real images use a bit more data (200 rows) but that makes little difference to speed.

At the speed above it will take me around 18 hours to generate all my plots? Are there any clever ways to speed this up?

Per the comment from AKX, Pillow has a function ImageDraw.line() that performs faster for this task:

from PIL import Image, ImageDraw
from itertools import chain
scale = 224
pad = 5
scale_pad = scale - pad * 2
for i in range(200):
    im = Image.new('RGB', (scale, scale), (255, 255, 255)) 
    draw = ImageDraw.Draw(im) 
    x = [1, 2, 3, 4, 5, 6, 7, 8]
    y = [2, 4, 6, 8, 8, 6, 4, 3]
    x = [pad + (i - min(x)) / (max(x) - min(x)) * scale_pad for i in x]
    y = [pad + (i - min(y)) / (max(y) - min(y)) * scale_pad  for i in y]

    draw.line(list(chain.from_iterable(zip(x, y))), fill=(0, 0, 0), width=4)
    im.save(f"{i}.png")

This performs about 6x faster than Matplotlib, meaning my task should take only ~3 hours instead of 18.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM