带有lambda的熊猫groupby在列表中

Question

I have following dataframe 我有以下数据框

df = pd.DataFrame({'ItemType': ['Red', 'White', 'Red', 'Blue', 'White', 'White', 'White', 'Green'], 
               'ItemPrice': [10, 11, 12, 13, 14, 15, 16, 17], 
               'ItemID': ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'D']})

I would like get records (rows) with ItemIDs that contain only "White" ItemType in a form of a DataFrame 我想以DataFrame的形式获取具有仅包含“ White” ItemType的ItemID的记录（行）

I have attempted following solution: 我尝试了以下解决方案：

types = ['Red','Blue','Green']

~df.groupby('ItemID')['ItemType'].any().apply(lambda u: u in(types))

But this gives me an incorrect result (D should be False) and in a form of a series. 但这给了我一个不正确的结果（D应该为False）并且是一系列的结果。

A False
B False
C True
D True

Thank you! 谢谢！

Answer 1

You should avoid using apply here, as it is usually quite slow. 您应该避免在此处使用apply ，因为它通常很慢。 Instead, assign a flag column before you groupby , and then use all to assert that none of a groups values are in types : 相反，请在groupby之前分配一个flag列，然后使用all断言一个group值都不属于types ：

df.assign(flag=~df.ItemType.isin(types)).groupby('ItemID').flag.all()

ItemID
A    False
B    False
C     True
D    False
Name: flag, dtype: bool

However, just to demonstrate the logic of the operation, and show what was incorrect about your approach, here is a working version using apply : 但是，只是为了演示操作的逻辑，并说明您的方法的不正确之处，以下是使用apply的工作版本：

~df.groupby('ItemID').ItemType.apply(lambda x: any(i in types for i in x))

You need to use any inside the lambda, as opposed to on the Series before using apply . 在使用apply之前，您需要在lambda 内部使用any 东西，而不是在Series上。

To access rows where this condition is met, you may use transform : 要访问满足此条件的行，可以使用transform ：

df[df.assign(flag=~df.ItemType.isin(types)).groupby('ItemID').flag.transform('all')]

  ItemType  ItemPrice ItemID
4    White         14      C
5    White         15      C

Answer 2

An alternative method is to calculate an array of non-white ItemID values. 一种替代方法是计算非白色ItemID值的数组。 Then filter your dataframe: 然后过滤您的数据框：

non_whites = df.loc[df['ItemType'].ne('White'), 'ItemID'].unique()

res = df[~df['ItemID'].isin(non_whites)]

print(res)

  ItemType  ItemPrice ItemID
4    White         14      C
5    White         15      C

You can also use GroupBy , but it's not absolutely necessary. 您也可以使用GroupBy ，但这不是绝对必要的。

带有lambda的熊猫groupby在列表中

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-09-10 22:44:01

解决方案2
1 2018-09-10 22:58:21

带有lambda的熊猫groupby在列表中

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-09-10 22:44:01

解决方案2 1 2018-09-10 22:58:21

解决方案1
2 已采纳 2018-09-10 22:44:01

解决方案2
1 2018-09-10 22:58:21