简体   繁体   中英

Matplotlib pandas Quarterly bar chart with datetime as index not Working

I have a pandas series with index as datetime which I am trying to visualize, using bar graph. My code is below. But the chart I am getting is not quite accurate (pic below) it seems. How do I fix this? 条形图

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(100)
dti = pd.date_range('2012-12-31', periods=30, freq='Q')
s2 = pd.Series(np.random.randint(100,1000,size=(30)),index=dti)
df4 = s2.to_frame(name='count')
print('\ndf4:')
print(df4)
print(type(df4))
f2 = plt.figure("Quarterly",figsize=(10,5))
ax = plt.subplot(1,1,1)
ax.bar(df4.index,df4['count'])
plt.tight_layout()
plt.show()

Unfortunately, matplotlib's bar plots don't seem to play along very happily with pandas dates.

In theory, matplotlib expresses the bar widths in days. But if you try something like ax.bar(df4.index,df4['count'], width=30) , you'll see the plot with extremely wide bars, almost completely filling the plot. Experimenting with the width , something weird happens. When width is smaller than 2, it looks like it is expressed in days. But with the width larger than 2, it suddenly jumps to something much wider.

On my system (matplotlib 3.1.2, pandas 0.25.3, Windows) it looks like: 默认绘图

A workaround uses the bar plots from pandas. These seem to make the bars categorical, with one tick per bar. But they get labelled with a full date including hours, minutes and seconds. You could relabel them, for example like:

df4.plot.bar(y='count', width=0.9, ax=ax)
plt.xticks(range(len(df4.index)),
           [t.to_pydatetime().strftime("%b '%y") for t in df4.index],
           rotation=90)

Investigating further, the inconsistent jumping around of matplotlib's bar width, seems related to the frequency build into pandas times. So, a solution could be to convert the dates to matplotlib dates. Trying this, yes, the widths get expressed consistently in days.

Unfortunately, the quarterly dates don't have exactly the same number of days between them, resulting in some bars too wide, and others too narrow. A solution to this next problem is explicitly calculating the number of days for each bar. In order to get nice separations between the bars, it helps to draw their edges in white.

from datetime import datetime

x = [datetime.date(t) for t in df4.index]  # convert the pandas datetime to matplotlib's
widths = [t1-t0 for t0, t1 in zip(x, x[1:])]  # time differences between dates
widths += [widths[-1]] # the very last bar didn't get a width, just repeat the last width
ax.bar(x, df4['count'], width=widths, edgecolor='white')

结果图

You can set the width of the bars via the width argument in ax.bar() to some value larger than the default of 0.8

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(100)
dti = pd.date_range('2012-12-31', periods=30, freq='Q')
s2 = pd.Series(np.random.randint(100,1000,size=(30)),index=dti)
df4 = s2.to_frame(name='count')
f2 = plt.figure("Quarterly",figsize=(10,5))
ax = plt.subplot(1,1,1)
ax.bar(df4.index,df4['count'], width=70)
plt.tight_layout()
plt.show()

在此处输入图片说明

In this case the width is interpreted as a scalar in days.


Edit

For some reason the above only works correctly for older versions of matplotlib (tested 2.2.3). In order to work with current (3.1.2) version the following modification must be made:

# ...
dti = pd.date_range('2012-12-31', periods=30, freq='Q')
dti = [pd.to_datetime(t) for t in dti]
# ...

which will then produce the correct behavior in setting the widths of the bars.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM