[英]Pandas groupby date month and count items within months
I have a dataframe like this: 我有一个这样的数据框:
STYLE | INVOICE_DATE2
A | 2017-01-03
B | 2017-01-03
C | 2017-01-03
A | 2017-02-03
A | 2017-01-03
B | 2017-02-03
B | 2017-01-03
I'm trying to group them by month and count itself within month, result must like this: 我试图按月对它们进行分组并在一个月内进行计数,结果必须像这样:
Month | Item | Count
1 | A | 2
| B | 2
| C | 1
2 | A | 1
| B | 1
I have tried this: 我已经试过了:
lastyear_df.groupby([(df['INVOICE_DATE2']).dt.month, df['STYLE']])['STYLE'].count()
But it didn't work for me. 但这对我没有用。
Here is a one liner... 这是一个班轮...
ans = df.groupby([df.INVOICE_DATE2.apply(lambda x: x.month), 'STYLE']).count()
Here is the output 这是输出
In [21]: ans
Out[21]:
INVOICE_DATE2
INVOICE_DATE2 STYLE
1 A 2
B 2
C 1
2 A 1
B 1
NOTE: That at this point you have a hierarchical index, which you can flatten by using reset_index
注意:至此,您已经有了一个层次结构索引,可以使用
reset_index
进行展平
ans = ans.reset_index(1)
STYLE INVOICE_DATE2
INVOICE_DATE2
1 A 2
1 B 2
1 C 1
2 A 1
2 B 1
You can now change the column and index names if you like: 现在,您可以根据需要更改列名和索引名:
ans.index.name = 'MONTH'
ans.columns = ['ITEM', 'COUNT']
I think you are close, need size
if want count NaN
s: 我认为您很亲密,如果要计数
NaN
,则需要size
:
d = {'INVOICE_DATE2':'Month','STYLE':'Item'}
df = (df.groupby([df['INVOICE_DATE2'].dt.month, 'STYLE'])
.size()
.reset_index(name='Count')
.rename(columns=d))
print (df)
Month Item Count
0 1 A 2
1 1 B 2
2 1 C 1
3 2 A 1
4 2 B 1
Or count
for count
only no NaN
s: 或
count
的count
只有不NaN
S:
d = {'INVOICE_DATE2':'Month','STYLE':'Item'}
df = (df.groupby([df['INVOICE_DATE2'].dt.month, 'STYLE'])['STYLE']
.count()
.reset_index(name='Count')
.rename(columns=d))
print (df)
Month Item Count
0 1 A 2
1 1 B 2
2 1 C 1
3 2 A 1
4 2 B 1
Last if need only one unique value in first column: 如果第一列仅需要一个唯一值,则为最后一个:
df['Month'] = df['Month'].mask(df.duplicated('Month'),'')
print (df)
Month Item Count
0 1 A 2
1 B 2
2 C 1
3 2 A 1
4 B 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.