简体   繁体   English

为什么 matplotlib 绘图比 pd.DataFrame.plot() 慢得多?

[英]Why is matplotlib plotting so much slower than pd.DataFrame.plot()?

Hello dear Community,亲爱的社区,您好,

I haven't found something similar during my search and hope I haven't overseen anything.我在搜索过程中没有发现类似的东西,希望我没有监督任何事情。 I have the following issue:我有以下问题:

I have a big dataset whichs shape is 1352x121797 (1353 samples and 121797 time points).我有一个大数据集,其形状为 1352x121797(1353 个样本和 121797 个时间点)。 Now I have clustered these and would like to generate one plot for each cluster in which every time series for this cluster is plotted.现在我已经对这些进行了聚类,并希望为每个聚类生成一个图,其中绘制了该聚类的每个时间序列。

However, when using the matplotlib syntax it is like super extremely slow (and I'm not exactly sure where that comes from).但是,当使用 matplotlib 语法时,它就像超级慢(而且我不确定它来自哪里)。 Even after 5-10 minutes it hasn't finished.即使过了 5-10 分钟,它也没有完成。

import matplotlib.pyplot as plt
import pandas as pd

fig, ax = plt.subplots()

for index, values in subset_cluster.iterrows(): # One Cluster subset, dataframe of shape (11x121797)
    ax.plot(values)

fig.savefig('test.png')

Even, when inserting a break after ax.plot(values) it still doesn't finish.甚至,在ax.plot(values)之后插入一个中断时,它仍然没有完成。 I'm using Spyder and thought that it might be due to Spyder always rendering the plot inline in the console.我正在使用 Spyder 并认为这可能是由于 Spyder 总是在控制台中内联渲染绘图。

However, when simply using the pandas method of the Series values.plot() instead of ax.plot(values) the plot appears and is saved in like 1-2 seconds.但是,当简单地使用 Series values.plot()的 pandas 方法而不是ax.plot(values) ,绘图会出现并在 1-2 秒内保存。

As I need the customization options of matplotlib for standardizing all the plots and make them look a little bit prettier I would love to use the matplotlib syntax.因为我需要 matplotlib 的自定义选项来标准化所有绘图并使它们看起来更漂亮一点,所以我很想使用 matplotlib 语法。 Anyone has any ideas?任何人有任何想法?

Thanks in advance提前致谢

Edit: so while trying around a little bit it seems, that the rendering is the time-consuming part.编辑:所以虽然尝试了一下,但渲染似乎是耗时的部分。 When ran with the backend matplotlib.use('Agg') , the plot command runs through quicker (if using plt.plot() instead of ax.plot() ), but plt.savefig() then takes forever.当使用后端matplotlib.use('Agg')运行时,绘图命令运行得更快(如果使用plt.plot()而不是ax.plot() ),但plt.savefig()则需要永远。 However, still it should be in a considerable amount of time right?不过,还是应该在相当长的时间内吧? Even for 121xxx data points.即使是 121xxx 数据点。

Posting as answer as it may help OP or someone else: I had the same problem and found out that it was because the data I was using as x-axis was an Object, while the y-axis data was float64.发布答案,因为它可能有助于 OP 或其他人:我遇到了同样的问题,并发现这是因为我用作 x 轴的数据是一个对象,而 y 轴数据是 float64。 After explicitly setting the object to DateTime, plotting With Matplotlib went as fast as Pandas' df.plot().将对象显式设置为 DateTime 后,使用 Matplotlib 绘图的速度与 Pandas 的 df.plot() 一样快。 I guess that Pandas does a better job at understanding the data type when plotting.我猜 Pandas 在绘图时在理解数据类型方面做得更好。

OP, you might want to check if the values you are plotting are in the right type, or if, like me, you had some problems when loading the dataframe from file. OP,您可能想检查您绘制的值是否为正确的类型,或者您是否像我一样在从文件加载数据帧时遇到了一些问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM