简体   繁体   English

按多列分组并根据列值提取前 x 行

[英]Groupby multiple columns and extract top x rows based on column value

Groupby based on 'month', 'userid' columns and extract only the values as given in 'cntr' column while deleting the rest. Groupby 基于 'month'、'userid' 列并仅提取 'cntr' 列中给出的值,同时删除其余列。

Input Dataframe is:输入数据帧是:

import pandas as pd
data = {'month': ['Jan','Jan','Jan','Jan','Feb','Feb','Feb','Feb','Feb','Feb'],
   'userid': ['2345','2345','2345','2345', '2345','2345','2345','5678', '5678', '5678'],
   'cntr': ['3','3','3','3','1','1','1','2','2','2']}

df = pd.DataFrame(data = data, columns = ['month','userid','cntr'])
print(df)

Looks like this,看起来像这样,

     month    userid     cntr
0     Jan        2345     3
1     Jan        2345     3 
2     Jan        2345     3
3     Jan        2345     3
4     Feb        2345     1
5     Feb        2345     1
6     Feb        2345     1
7     Feb        5678     2  
8     Feb        5678     2
9     Feb        5678     2

Output required需要输出

     month    userid     cntr
0     Jan        2345     3
1     Jan        2345     3 
2     Jan        2345     3
3     Feb        2345     1
4     Feb        5678     2  
5     Feb        5678     2

Use custom lambda function in GroupBy.apply with DataFrame.head :GroupBy.applyDataFrame.head使用自定义 lambda 函数:

#if necessary convert to integers
df['cntr'] = df['cntr'].astype(int)

f = lambda x: x.head(x['cntr'].iat[0])
df = df.groupby(['month','userid'], sort=False).apply(f).reset_index(drop=True)
print (df)
  month userid  cntr
0   Jan   2345     3
1   Jan   2345     3
2   Jan   2345     3
3   Feb   2345     1
4   Feb   5678     2
5   Feb   5678     2

You can simply do this using Groupby.head :您可以使用Groupby.head简单地做到这Groupby.head

In [3446]: df = df.groupby(['month','userid']).head(df.cntr.astype(int))

In [3447]: df
Out[3447]: 
  month userid cntr
0   Jan   2345    3
1   Jan   2345    3
2   Jan   2345    3
4   Feb   2345    1
7   Feb   5678    2
8   Feb   5678    2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM