[英]Pandas DataFrame - convert months to datetime and iteratively select data from multiple columns for plotting
Say I have a pandas DataFrame with the format: 假设我有一个格式为pandas的DataFrame:
Month Thing1 Thing2 Tot
0 Jan-12 A Z 0.005880
1 Jan-12 A Z 0.024500
...
20 Jan-12 B Y 0.001533
21 Jan-12 C X 0.003892
22 Jan-12 C X 0.001680
23 Jan-12 C X 0.001680
24 Jan-12 C X 0.001680
25 Jan-12 C X 0.001680
26 Jan-12 A W 0.001680
27 Jan-12 D V 0.013440
28 Jan-12 E U 0.001680
...
The Month column goes unitl Apr-14. 月列将统一为14年4月。 I am trying to plot line graphs for the monthly totals for each item in
Thing1
and Thing2
. 我正在尝试为
Thing1
和Thing2
每个项目的每月总计绘制折线图。
I am attempting this using groupby
: 我正在尝试使用
groupby
:
a=pd.read_csv('all2.csv')
sums=a.groupby([u'Month',u'Thing1',u'Thing2']).sum()
which gives me: 这给了我:
Apr-12 A W 6.427773
Z 4.347471
B T 7.062425
Y 17.183562
C X 14.583337
D V 0.114450
E U 0.008050
F Q 0.000490
R 0.004468
G P 0.010932
...
However the months come up alphabetically. 但是,按字母顺序显示月份。 My questions are:
我的问题是:
How can I get Pandas to consider the month column as a datetime object? 如何让Pandas将月份列视为日期时间对象?
How can I iterate through Thing1
column and plot time series monthly totals for each item in Thing2
? 如何遍历
Thing1
列并绘制Thing2
每个项目的每月时间序列总计?
I imagine there is a way to reorganise the Dataframe such that a simple call to plot()
will do the job? 我想象有一种重组Dataframe的方法,这样对
plot()
的简单调用就可以完成工作?
This is because your 'Month' column is not in the right dtype
. 这是因为您的“月”列不在正确的
dtype
。 You can get the intended result by firstly converting the Month
column to datetime format: 您可以通过首先将“
Month
列转换为日期时间格式来获得预期的结果:
df['Month']=pd.to_datetime(df.Month)
, before calling df.groupby([u'Month',u'Thing1',u'Thing2']).sum()
df['Month']=pd.to_datetime(df.Month)
,然后调用df.groupby([u'Month',u'Thing1',u'Thing2']).sum()
But careful, Pandas
doesn't know whether Jan-12
means 2014-01-12
or 2012-01
, by default it convert you data to the former. 但请注意,
Pandas
不知道Jan-12
意味着2014-01-12
还是2012-01
,默认情况下会将您的数据转换为前者。 To get the latter, supply .to_datetime
with format='%b-%y'
argument. 要获取后者,请为
.to_datetime
提供format='%b-%y'
参数。
For your second question, you can get the level of Thing1
by dfgb.index.get_level_values(1)
. 对于第二个问题,可以通过
dfgb.index.get_level_values(1)
获得Thing1
的级别。 where dfgb
is the DataFrame
from groupby
. 其中
dfgb
是groupby
的DataFrame
。 Then you can plot the time series by: 然后可以通过以下方式绘制时间序列:
for item in dfgb.index.get_level_values(1):
dfgb.xs(item, level=1).plot(kind='bar') #for bar graph
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.