[英]plot year over year on 12 month axis
I want to plot 6 years of 12 month period data on one 12 month axis from Dec - Jan.我想要 plot 从 12 月到 1 月的一个 12 个月轴上的 6 年 12 个月期间数据。
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
df = pd.Series(np.random.randn(72), index=pd.date_range('1/1/2000', periods=72, freq='M'))
# display(df.head())
2000-01-31 0.713724
2000-02-29 0.416233
2000-03-31 -0.147765
2000-04-30 0.141021
2000-05-31 0.966261
Freq: M, dtype: float64
grouped = df.groupby(df.index.map(lambda x: x.year))
grouped.plot()
I'm getting the breaks in the lines between each year.我每年都在休息。 However, what I want to do is have the year stacked over each other.
然而,我想做的是让年份相互叠加。 Any simple and clean ways to do it?
有什么简单干净的方法吗?
There's probably a better way than this: 可能有一个比这更好的方法:
In [44]: vals = df.groupby(lambda x: (x.year, x.month)).sum()
In [45]: vals
Out[45]:
(2000, 1) -0.235044
(2000, 2) -1.196815
(2000, 3) -0.370850
(2000, 4) 0.719915
(2000, 5) -1.228286
(2000, 6) -0.192108
(2000, 7) -0.337032
(2000, 8) -0.174219
(2000, 9) 0.605742
(2000, 10) 1.061558
(2000, 11) -0.683674
(2000, 12) -0.813779
(2001, 1) 2.103178
(2001, 2) -1.099845
(2001, 3) 0.366811
...
(2004, 10) -0.905740
(2004, 11) -0.143628
(2004, 12) 2.166758
(2005, 1) 0.944993
(2005, 2) -0.741785
(2005, 3) 1.531754
(2005, 4) -1.106024
(2005, 5) -1.925078
(2005, 6) 0.400930
(2005, 7) 0.321962
(2005, 8) -0.851656
(2005, 9) 0.371305
(2005, 10) -0.868836
(2005, 11) -0.932977
(2005, 12) -0.530207
Length: 72, dtype: float64
Now change the index on vals
to a MultiIndex
现在改变指数
vals
到MultiIndex
In [46]: vals.index = pd.MultiIndex.from_tuples(vals.index)
In [47]: vals.head()
Out[47]:
2000 1 -0.235044
2 -1.196815
3 -0.370850
4 0.719915
5 -1.228286
dtype: float64
Then unstack and plot: 然后拆散并绘图:
In [48]: vals.unstack(0).plot()
Out[48]: <matplotlib.axes.AxesSubplot at 0x1171a2dd0>
pandas.DataFrame
, not a pandas.Series
.pandas.DataFrame
而不是pandas.Series
,我认为它更清晰,更容易转换。
pandas.Series
, but it's going to be more typical for people looking to solve this question, if we begin with a pandas.DataFrame
, so we'll begin by using .to_frame()
pandas.Series
,但如果我们从pandas.DataFrame
开始,对于希望解决此问题的人来说它会更典型,所以我们将首先使用.to_frame()
month
and year
component of the datetime
index.datetime
时间索引的month
和year
部分。
datetime dtype
;datetime dtype
; if your data is not, use pd.to_datetime()
to convert the date index / columnpd.to_datetime()
转换日期索引/列.dt
accessor to get month
and year
(eg df[col].dt.year
or df.index.year
).dt
访问器获取month
和year
(例如df[col].dt.year
或df.index.year
)pandas.pivot_table
to transform the dataframe from a long to wide format, and aggregate the data (eg 'sum'
, 'mean'
, etc.)pandas.pivot_table
将 dataframe 从长格式转换为宽格式,并汇总数据(例如'sum'
、 'mean'
等)
'month'
, so no aggregation is required, then use pandas.DataFrame.pivot
.'month'
没有重复数据,则不需要聚合,则使用pandas.DataFrame.pivot
。pandas.DataFrame.plot
pandas.DataFrame.plot
python 3.11
, pandas 1.5.2
, matplotlib 3.6.2
python 3.11
pandas 1.5.2
matplotlib 3.6.2
中测试import pandas as pd
# for this OP convert the Series to a DataFrame
df = df.to_frame()
# extract month and year from the index and create columns
df['month'] = df.index.month
df['year'] = df.index.year
# display(df.head(3))
0 month year
2000-01-31 0.167921 1 2000
2000-02-29 0.523505 2 2000
2000-03-31 0.817376 3 2000
# transform the dataframe to a wide format
dfp = pd.pivot_table(data=df, index='month', columns='year', values=0, aggfunc='sum')
# display(dfp.head(3))
year 2000 2001 2002 2003 2004 2005
month
1 0.167921 0.637999 -0.174122 0.620622 -0.854315 -1.523579
2 0.523505 -0.344658 -0.280819 0.845543 0.782439 -0.593732
3 0.817376 -0.004282 -0.907424 0.352655 1.258275 -0.624112
# plot; us xticks=dfp.index so every month number is displayed
ax = dfp.plot(ylabel='Aggregated Sum', figsize=(6, 4), xticks=dfp.index)
# reposition the legend
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
'month'
column with:'month'
列:
df['month'] = df.index.strftime('%b')
, which get the month abbreviation df['month'] = df.index.strftime('%b')
,得到月份缩写from calendar import month_abbr # this is a sorted list of month name abbreviations
# for this OP convert the Series to a DataFrame
df = df.to_frame()
# extract the month abbreviation
df['month'] = df.index.strftime('%b')
df['year'] = df.index.year
# transform
dfp = pd.pivot_table(data=df, index='month', columns='year', values=0, aggfunc='sum')
# the dfp index so the x-axis will be in order
dfp = dfp.loc[month_abbr[1:]]
# display(dfp.head(3))
year 2000 2001 2002 2003 2004 2005
month
Jan 0.167921 0.637999 -0.174122 0.620622 -0.854315 -1.523579
Feb 0.523505 -0.344658 -0.280819 0.845543 0.782439 -0.593732
Mar 0.817376 -0.004282 -0.907424 0.352655 1.258275 -0.624112
# plot; using xticks=range(12) will result in all the xticks being labeled with a month, otherwise not all ticks will be displayed
ax = dfp.plot(ylabel='Aggregated Sum', figsize=(6, 4), xticks=range(12))
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
ax = dfp.plot(kind='bar', ylabel='Aggregated Sum', figsize=(12, 4), rot=0)
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.