[英]Time-series plotting inconsistencies in Pandas
Say I have a dataframe df
where df.index
consists of datetime
objects, eg 假设我有一个数据帧
df
,其中df.index
由datetime
对象组成,例如
> df.index[0]
datetime.date(2014, 5, 5)
If I plot it Pandas nicely preserves the datetime
type in the plot, which allows the user to change the time-series sampling as well formatting options of the plot: 如果我绘制它,Pandas很好地保留了绘图中的
datetime
类型,这允许用户更改时间序列采样以及绘图的格式选项:
# Plot the dataframe:
f = plt.figure(figsize=(8,8))
ax = f.add_subplot(1,1,1)
lines = df.plot(ax=ax)
# Choose the sampling rate in terms of dates:
ax.xaxis.set_major_locator(matplotlib.dates.WeekdayLocator(byweekday=(0,1,2,3,4,5,6),
interval=1))
# We can also re-sample the X axis numerically if we want (e.g. every 4 steps):
N = 4
ticks = ax.xaxis.get_ticklocs()
ticklabels = [l.get_text() for l in ax.xaxis.get_ticklabels()]
ax.xaxis.set_ticks(ticks[-1::-N][::-1])
ax.xaxis.set_ticklabels(ticklabels[-1::-N][::-1])
# Choose a date formatter using a date-friendly syntax:
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b\n%d'))
plt.show()
However, the above does not work for a boxplot
(the tick labels for the x axis are rendered empty) : 然而,上述不为一个工作
boxplot
(对于x轴的刻度标签呈现空):
df2.boxplot(column='A', by='created_dt',ax=ax, sym="k.")
# same code as above ...
It looks like in the last example, Pandas converts the x-axis labels into string type, so the formatter and locators don't work anymore. 看起来在最后一个例子中,Pandas将x轴标签转换为字符串类型,因此格式化程序和定位器不再起作用。
This post re-uses solutions from the following threads: 这篇文章重用了以下主题的解决方案:
Why? 为什么? How can I use
boxplot
in a way that allows me to use matplotlib
date locators and formatters? 如何使用
boxplot
的方式,允许我使用matplotlib
日期定位器和格式化?
No, actually even the line plot is not working correctly, if you have the year show up, you will notice the problem: instead of being 2000 in the following example, the xticks are in 1989. 不,实际上连线图都没有正常工作,如果你有年份出现,你会注意到问题:在下面的例子中,不是2000,xticks是在1989年。
In [49]:
df=pd.DataFrame({'Val': np.random.random(50)})
df.index=pd.date_range('2000-01-02', periods=50)
f = plt.figure()
ax = f.add_subplot(1,1,1)
lines = df.plot(ax=ax)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%y%b\n%d'))
print ax.get_xlim()
(10958.0, 11007.0)
In [50]:
matplotlib.dates.strpdate2num('%Y-%M-%d')('2000-01-02')
Out[50]:
730121.0006944444
In [51]:
matplotlib.dates.num2date(730121.0006944444)
Out[51]:
datetime.datetime(2000, 1, 2, 0, 1, tzinfo=<matplotlib.dates._UTC object at 0x051FA9F0>)
Turns out datetime data is handled differently in pandas
and matplotlib
: in the latter, 2000-1-2
should be 730121.0006944444
, instead of 10958.0
in pandas
原来datetime数据在不同的处理
pandas
和matplotlib
:在后者, 2000-1-2
应该是730121.0006944444
,而不是10958.0
在pandas
To get it right we need to avoid using pandas
's plot
method: 为了做到正确,我们需要避免使用
pandas
的plot
方法:
In [52]:
plt.plot_date(df.index.to_pydatetime(), df.Val, fmt='-')
ax=plt.gca()
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%y%b\n%d'))
Similarly for barplot
: 同样对于
barplot
:
In [53]:
plt.bar(df.index.to_pydatetime(), df.Val, width=0.4)
ax=plt.gca()
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%y%b\n%d'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.