熊貓groupby篩選器，刪除一些組

Question

我有groupby對象

grouped = df.groupby('name')
for k,group in grouped:    
    print group

有3組bar ， foo和foobar

  name  time  
2  bar     5  
3  bar     6  


  name  time  
0  foo     5  
1  foo     2  

  name      time  
4  foobar     20  
5  foobar     1

我需要過濾這些組並刪除所有時間不超過5的組。在我的示例中，應該刪除組foo。 我正在嘗試使用功能filter（）

grouped.filter(lambda x: (x.max()['time']>5))

但是x顯然不僅是數據幀格式的組。

Answer 1

假設您的最后一行代碼實際上應該是>5而不是>20 ，那么您將執行以下操作：

grouped.filter(lambda x: (x.time > 5).any())

正如您正確地發現的那樣， x實際上是所有索引的DataFrame ，其中name列與for循環中k中的鍵匹配。

因此，您要根據時間列中是否有大於5的時間進行過濾，請執行上述(x.time > 5).any()進行測試。

Answer 2

我還不習慣python，numpy或pandas。 但是我正在研究類似問題的解決方案，所以讓我以這個問題為例來報告我的答案。

import pandas as pd

df = pd.DataFrame()
df['name'] = ['foo', 'foo', 'bar', 'bar', 'foobar', 'foobar']
df['time'] = [5, 2, 5, 6, 20, 1]

grouped = df.groupby('name')
for k, group in grouped:
    print(group)

我的答案1：

indexes_should_drop = grouped.filter(lambda x: (x['time'].max() <= 5)).index
result1 = df.drop(index=indexes_should_drop)

我的答案2：

filter_time_max = grouped['time'].max() > 5
groups_should_keep = filter_time_max.loc[filter_time_max].index
result2 = df.loc[df['name'].isin(groups_should_keep)]

我的答案3：

filter_time_max = grouped['time'].max() <= 5
groups_should_drop = filter_time_max.loc[filter_time_max].index
result3 = df.drop(df[df['name'].isin(groups_should_drop)].index)

結果

    name    time
2   bar     5
3   bar     6
4   foobar  20
5   foobar  1

點

我的Answer1不使用群組名稱刪除群組。 如果需要組名，可以通過編寫以下df.loc[indexes_should_drop].name.unique()獲得它們： df.loc[indexes_should_drop].name.unique() 。

grouped['time'].max() <= 5和grouped.apply(lambda x: (x['time'].max() <= 5)).index返回相同的結果。

filter_time_max的索引是組名。 它不能用作直接刪除的索引或標簽。

name
foo        True
bar       False
foobar    False
Name: time, dtype: bool

熊貓groupby篩選器，刪除一些組

問題描述

2 個解決方案

解決方案1
1 2014-07-15 16:36:49

解決方案2
0 2019-09-01 11:39:45

我的答案1：

我的答案2：

我的答案3：

結果

點

熊貓groupby篩選器，刪除一些組

問題描述

2 個解決方案

解決方案1 1 2014-07-15 16:36:49

解決方案2 0 2019-09-01 11:39:45

我的答案1：

我的答案2：

我的答案3：

結果

點

解決方案1
1 2014-07-15 16:36:49

解決方案2
0 2019-09-01 11:39:45