简体   繁体   English

matplotlib散点图在png上绘制

[英]matplotlib scatter plotting over png

I'm trying to make a scatter plot consisting of ~6 million points in an attempt to understand some sort of clustering.我正在尝试制作一个由约 600 万个点组成的散点图,以试图了解某种聚类。

When I try to do this in a simple scatter command, matplotlib complains about excessive memory.当我尝试在一个简单的 scatter 命令中执行此操作时,matplotlib 会抱怨内存过多。 So I decided to plot 3000 points and then save the figure in .png format, clear the figure, load the saved .png with imread() and then overlay the next 3000 points.所以我决定绘制 3000 个点,然后以 .png 格式保存图形,清除图形,使用imread()加载保存的 .png,然后覆盖接下来的 3000 个点。

I'm facing some padding issues and I do not understand how they've arisen.我正面临一些填充问题,我不明白它们是如何出现的。 My code is a bit long since I'm parsing a lot of text files but below is a sample mockup code that replicates my thinking:我的代码有点长,因为我要解析很多文本文件,但下面是一个复制我的想法的示例模型代码:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plt.xlim(0,1000)
plt.ylim(-1000,1000)
plt.scatter(400,500,marker="+",c="r")
plt.gca().set_aspect('equal')
plt.draw()
plt.savefig(r"C:\TMP\fig1.png")
plt.clf()
im = plt.imread(r"C:\TMP\fig1.png")
implot = plt.imshow(im, origin='upper', aspect='equal', extent=[0,1000,-1000,1000], zorder=0)
plt.scatter(600,500,marker="+",c="b")
plt.savefig(r"C:\TMP\fig2.png")
plt.close(fig)

The outcome is something that I don't understand how to interpret.结果是我不明白如何解释的东西。 Obviously, I do not understand the relationship between "aspect" and "extent" from imshow().显然,我不明白 imshow() 中“方面”和“范围”之间的关系。 Can somebody help me with this?有人可以帮我解决这个问题吗?

Figure 1图1

图1.png

Figure 2图2

图2.png

I was expecting both fig1.png and fig2.png to perfectly overlay on top of another.我期待fig1.pngfig2.png完美地叠加在另一个之上。

I made a memory profile with memory_profiler for a sensible example, for 6M points.我用memory_profiler制作了一个内存配置文件作为一个明智的例子,为 6M 点。

import numpy as np
import time
x = np.random.normal(size=6000000)
y = np.random.normal(size=6000000)

start = time.time()
plt.scatter(x, y, alpha=0.1)
end = time.time() - start
print(end)

out is 30.015294551849365 seconds. out是 30.015294551849365 秒。 It's not terribly slow.这不是很慢。

On the other hand, the profile output:另一方面,配置文件输出:

Line #    Mem usage    Increment   Line Contents
================================================
 5   81.738 MiB    0.000 MiB   @profile
 6                             def make_test():
 7  127.516 MiB   45.777 MiB       x = np.random.normal(size=6000000)
 8  173.293 MiB   45.777 MiB       y = np.random.normal(size=6000000)
 9                             
10  282.934 MiB  109.641 MiB       plt.scatter(x, y, alpha=0.1)
11  298.160 MiB   15.227 MiB       plt.savefig('big_plot')

It reaches up to 300 Mb, which is not a memory problem either.它高达 300 Mb,这也不是内存问题。 The problem is elsewhere, but you should be able to plot ALL the points together.问题出在别处,但您应该能够将所有点绘制在一起。

Finally, the scatter plot:最后,散点图:

big_scatter_plot

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM