简体   繁体   English

时间序列绘制Pandas中的不一致性

[英]Time-series plotting inconsistencies in Pandas

Say I have a dataframe df where df.index consists of datetime objects, eg 假设我有一个数据帧df ,其中df.indexdatetime对象组成,例如

> df.index[0]
datetime.date(2014, 5, 5)

If I plot it Pandas nicely preserves the datetime type in the plot, which allows the user to change the time-series sampling as well formatting options of the plot: 如果我绘制它,Pandas很好地保留了绘图中的datetime类型,这允许用户更改时间序列采样以及绘图的格式选项:

  # Plot the dataframe:
  f     = plt.figure(figsize=(8,8))
  ax    = f.add_subplot(1,1,1)
  lines = df.plot(ax=ax)

  # Choose the sampling rate in terms of dates:
  ax.xaxis.set_major_locator(matplotlib.dates.WeekdayLocator(byweekday=(0,1,2,3,4,5,6),
                                                            interval=1))

  # We can also re-sample the X axis numerically if we want (e.g. every 4 steps):
  N = 4

  ticks      = ax.xaxis.get_ticklocs()
  ticklabels = [l.get_text() for l in ax.xaxis.get_ticklabels()]

  ax.xaxis.set_ticks(ticks[-1::-N][::-1])
  ax.xaxis.set_ticklabels(ticklabels[-1::-N][::-1])

  # Choose a date formatter using a date-friendly syntax:
  ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b\n%d'))

  plt.show()

However, the above does not work for a boxplot (the tick labels for the x axis are rendered empty) : 然而,上述为一个工作boxplot (对于x轴的刻度标签呈现空):

df2.boxplot(column='A', by='created_dt',ax=ax, sym="k.")

# same code as above ...

It looks like in the last example, Pandas converts the x-axis labels into string type, so the formatter and locators don't work anymore. 看起来在最后一个例子中,Pandas将x轴标签转换为字符串类型,因此格式化程序和定位器不再起作用。

This post re-uses solutions from the following threads: 这篇文章重用了以下主题的解决方案:

  1. Accepted answer to Pandas timeseries plot setting x-axis major and minor ticks and labels Pandas timeseries绘图设置x轴主要和次要刻度和标签的接受答案
  2. Accepted answer to Pandas: bar plot xtick frequency Pandas接受的答案:bar plot xtick frequency

Why? 为什么? How can I use boxplot in a way that allows me to use matplotlib date locators and formatters? 如何使用boxplot的方式,允许我使用matplotlib日期定位器和格式化?

No, actually even the line plot is not working correctly, if you have the year show up, you will notice the problem: instead of being 2000 in the following example, the xticks are in 1989. 不,实际上连线图都没有正常工作,如果你有年份出现,你会注意到问题:在下面的例子中,不是2000,xticks是在1989年。

In [49]:
df=pd.DataFrame({'Val': np.random.random(50)})
df.index=pd.date_range('2000-01-02', periods=50)
f     = plt.figure()
ax    = f.add_subplot(1,1,1)
lines = df.plot(ax=ax)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%y%b\n%d'))
print ax.get_xlim()
(10958.0, 11007.0)

在此输入图像描述

In [50]:
matplotlib.dates.strpdate2num('%Y-%M-%d')('2000-01-02')
Out[50]:
730121.0006944444
In [51]:
matplotlib.dates.num2date(730121.0006944444)
Out[51]:
datetime.datetime(2000, 1, 2, 0, 1, tzinfo=<matplotlib.dates._UTC object at 0x051FA9F0>)

Turns out datetime data is handled differently in pandas and matplotlib : in the latter, 2000-1-2 should be 730121.0006944444 , instead of 10958.0 in pandas 原来datetime数据在不同的处理pandasmatplotlib :在后者, 2000-1-2应该是730121.0006944444 ,而不是10958.0pandas

To get it right we need to avoid using pandas 's plot method: 为了做到正确,我们需要避免使用pandasplot方法:

In [52]:
plt.plot_date(df.index.to_pydatetime(), df.Val, fmt='-')
ax=plt.gca()
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%y%b\n%d'))

在此输入图像描述

Similarly for barplot : 同样对于barplot

In [53]:
plt.bar(df.index.to_pydatetime(), df.Val, width=0.4)
ax=plt.gca()
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%y%b\n%d'))

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM