[英]How I can apply groupby two times on pandas data frame?
I have pandas data frame with column 'year', 'month' and 'transaction id'.我有 pandas 数据框,其中包含“年”、“月”和“交易 ID”列。 I want to get the transaction count of every month for every year.
我想获得每年每个月的交易计数。 For ex my data is like:
对于前我的数据是这样的:
year: {2015,2015,2015,2016,2016,2017}
month: {1, 1, 2, 2, 2, 1}
tid: {123, 343, 453, 675, 786, 332}
I want to get the output such that for every year I will get the number of transactions per month.我想得到 output 这样每年我都会得到每月的交易数量。 For ex for year 2015 I will get the output:
对于 2015 年,我将获得 output:
month: [1,2]
count: [2,1]
I used groupby('year').我使用了 groupby('year')。 but after that how I can get the per month transaction count.
但在那之后我如何获得每月的交易计数。
You need groupby
by both columns - year
and month
and then aggregate size
: 你需要
groupby
两个列 - year
和month
,然后聚合size
:
year = [2015,2015,2015,2016,2016,2017]
month = [1, 1, 2, 2, 2, 1]
tid = [123, 343, 453, 675, 786, 332]
df = pd.DataFrame({'year':year, 'month':month,'tid':tid})
print (df)
month tid year
0 1 123 2015
1 1 343 2015
2 2 453 2015
3 2 675 2016
4 2 786 2016
5 1 332 2017
df1 = df.groupby(['year','month'])['tid'].size().reset_index(name='count')
print (df1)
year month count
0 2015 1 2
1 2015 2 1
2 2016 2 2
3 2017 1 1
Another option for more complex tasks - suppose you want to group by "year" and a function applied to "tid" - eg a bucket categorization更复杂任务的另一种选择 - 假设你想按“年”分组并将 function 应用于“tid” - 例如桶分类
def tidBucket(x):
if x<300: return "low"
if (300<=x & x<700): return "medium"
if 700<=x: return "high"
Then the above solution would not work.那么上述解决方案将不起作用。 You could solve the problem by first grouping by year, then iterate over the contents of the groupby object with another groupby:
您可以通过首先按年份分组来解决问题,然后使用另一个 groupby 迭代 groupby object 的内容:
gb = df.groupby(by='year') #['tid'].size().reset_index(name='count')
for _,df1 in gb:
df1.index = df1["tid"]
df1 = df1.groupby(by=tidBucket)
Then aggregate as desired.然后根据需要聚合。 Alternatively, you could create an additional "bucket" column
或者,您可以创建一个额外的“桶”列
df["bucket"] = df["tid"].map(tidBucket)
and follow the @jezrael 's solution.并遵循@jezrael 的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.