[英]Groupby multiple columns and extract top x rows based on column value
Groupby based on 'month', 'userid' columns and extract only the values as given in 'cntr' column while deleting the rest. Groupby 基于 'month'、'userid' 列并仅提取 'cntr' 列中给出的值,同时删除其余列。
Input Dataframe is:输入数据帧是:
import pandas as pd
data = {'month': ['Jan','Jan','Jan','Jan','Feb','Feb','Feb','Feb','Feb','Feb'],
'userid': ['2345','2345','2345','2345', '2345','2345','2345','5678', '5678', '5678'],
'cntr': ['3','3','3','3','1','1','1','2','2','2']}
df = pd.DataFrame(data = data, columns = ['month','userid','cntr'])
print(df)
Looks like this,看起来像这样,
month userid cntr
0 Jan 2345 3
1 Jan 2345 3
2 Jan 2345 3
3 Jan 2345 3
4 Feb 2345 1
5 Feb 2345 1
6 Feb 2345 1
7 Feb 5678 2
8 Feb 5678 2
9 Feb 5678 2
Output required需要输出
month userid cntr
0 Jan 2345 3
1 Jan 2345 3
2 Jan 2345 3
3 Feb 2345 1
4 Feb 5678 2
5 Feb 5678 2
Use custom lambda function in GroupBy.apply
with DataFrame.head
:在
GroupBy.apply
和DataFrame.head
使用自定义 lambda 函数:
#if necessary convert to integers
df['cntr'] = df['cntr'].astype(int)
f = lambda x: x.head(x['cntr'].iat[0])
df = df.groupby(['month','userid'], sort=False).apply(f).reset_index(drop=True)
print (df)
month userid cntr
0 Jan 2345 3
1 Jan 2345 3
2 Jan 2345 3
3 Feb 2345 1
4 Feb 5678 2
5 Feb 5678 2
You can simply do this using Groupby.head
:您可以使用
Groupby.head
简单地做到这Groupby.head
:
In [3446]: df = df.groupby(['month','userid']).head(df.cntr.astype(int))
In [3447]: df
Out[3447]:
month userid cntr
0 Jan 2345 3
1 Jan 2345 3
2 Jan 2345 3
4 Feb 2345 1
7 Feb 5678 2
8 Feb 5678 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.