[英]Pandas dataframe group by multiple columns
Given a dataframe with two datetime columns A
and B
and a numeric column C
, how to group by month
of both A
and B
and sum(C)
ie 给定具有两个日期时间列
A
和B
以及数字列C
的数据帧,如何month
分组A
和B
以及sum(C)
即
In [1]: df
Out[1]:
A B C
0 2013-01-01 2013-01-01 0.282863
1 2013-01-02 2013-01-01 0.173215
2 2013-02-03 2013-02-04 2.104569
3 2013-02-09 2013-04-15 0.706771
4 2013-03-05 2013-08-01 0.567020
5 2013-03-06 2013-04-01 0.113648
By using groupby
通过使用
groupby
df.groupby([df.A.dt.month,df.B.dt.month]).C.sum()
Out[954]:
A B
1 1 0.456078
2 2 2.104569
4 0.706771
3 4 0.113648
8 0.567020
Name: C, dtype: float64
Note: By using this , make sure A and B are datetime format If not , do following code before groupby
注意:使用此选项,请确保A和B是日期时间格式如果不是,请在
groupby
之前执行以下代码
df.A=pd.to_datetime(df.A)
df.B=pd.to_datetime(df.B)
I recently just read about a new function that makes grouping by dates super easy. 我最近刚刚阅读了一个新功能,它使日期分组变得非常容易。
df.A=pd.to_datetime(df.A)
df.B=pd.to_datetime(df.B)
df.groupby([pd.Grouper(key='A', freq='M'), pd.Grouper(key='B', freq='M')])['C'].sum()
The number of options this opens up makes it worth looking into: 这打开的选项数量值得研究:
Source: http://pbpython.com/pandas-grouper-agg.html 资料来源: http : //pbpython.com/pandas-grouper-agg.html
Different Date aliases: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases 不同的日期别名: http : //pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
df['month_A'] = [i.month for i in pd.to_datetime(df.A)]
df['month_B'] = [i.month for i in pd.to_datetime(df.B)]
df.groupby(['month_A', 'month_B']).sum()
If you combine with following, you will get back the result with the respective values in A and B column 如果您结合使用以下内容,您将使用A和B列中的相应值返回结果
idsum = df.groupby([df.A.dt.month,df.B.dt.month])["C"].transform(sum) == df["C"]
df=df[idsum]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.