按多列分组并根据列值提取前 x 行

Question

Groupby based on 'month', 'userid' columns and extract only the values as given in 'cntr' column while deleting the rest. Groupby 基于 'month'、'userid' 列并仅提取 'cntr' 列中给出的值，同时删除其余列。

Input Dataframe is:输入数据帧是：

import pandas as pd
data = {'month': ['Jan','Jan','Jan','Jan','Feb','Feb','Feb','Feb','Feb','Feb'],
   'userid': ['2345','2345','2345','2345', '2345','2345','2345','5678', '5678', '5678'],
   'cntr': ['3','3','3','3','1','1','1','2','2','2']}

df = pd.DataFrame(data = data, columns = ['month','userid','cntr'])
print(df)

Looks like this,看起来像这样，

     month    userid     cntr
0     Jan        2345     3
1     Jan        2345     3 
2     Jan        2345     3
3     Jan        2345     3
4     Feb        2345     1
5     Feb        2345     1
6     Feb        2345     1
7     Feb        5678     2  
8     Feb        5678     2
9     Feb        5678     2

Output required需要输出

     month    userid     cntr
0     Jan        2345     3
1     Jan        2345     3 
2     Jan        2345     3
3     Feb        2345     1
4     Feb        5678     2  
5     Feb        5678     2

Answer 1

Use custom lambda function in GroupBy.apply with DataFrame.head :在GroupBy.apply和DataFrame.head使用自定义 lambda 函数：

#if necessary convert to integers
df['cntr'] = df['cntr'].astype(int)

f = lambda x: x.head(x['cntr'].iat[0])
df = df.groupby(['month','userid'], sort=False).apply(f).reset_index(drop=True)
print (df)
  month userid  cntr
0   Jan   2345     3
1   Jan   2345     3
2   Jan   2345     3
3   Feb   2345     1
4   Feb   5678     2
5   Feb   5678     2

Answer 2

You can simply do this using Groupby.head :您可以使用Groupby.head简单地做到这Groupby.head ：

In [3446]: df = df.groupby(['month','userid']).head(df.cntr.astype(int))

In [3447]: df
Out[3447]: 
  month userid cntr
0   Jan   2345    3
1   Jan   2345    3
2   Jan   2345    3
4   Feb   2345    1
7   Feb   5678    2
8   Feb   5678    2

按多列分组并根据列值提取前 x 行

问题描述

2 个解决方案

解决方案1
1 2020-11-24 14:24:15

解决方案2
1 已采纳 2020-11-24 14:29:11

按多列分组并根据列值提取前 x 行

问题描述

2 个解决方案

解决方案1 1 2020-11-24 14:24:15

解决方案2 1 已采纳 2020-11-24 14:29:11

解决方案1
1 2020-11-24 14:24:15

解决方案2
1 已采纳 2020-11-24 14:29:11