I have groupby object
grouped = df.groupby('name')
for k,group in grouped:
print group
there are 3 groups bar , foo and foobar
name time
2 bar 5
3 bar 6
name time
0 foo 5
1 foo 2
name time
4 foobar 20
5 foobar 1
I need to filter these groups and drop all groups which have no time greater than 5. In my example the group foo should be dropped. I am trying to do it with function filter()
grouped.filter(lambda x: (x.max()['time']>5))
but the x is obviously not only the group in dataframe format.
Assuming your final line of code really should have a >5
rather than >20
, you would do something similar to:
grouped.filter(lambda x: (x.time > 5).any())
As you correctly spotted x
is actually a DataFrame
for all indices where the name
column matches the key you have in k
in your for-loop.
So you want to filter based on if there are any times larger than 5 in the time-column you do the above (x.time > 5).any()
to test it.
I'm not used to python, numpy or pandas yet. But I was investigating a solution to a similar problem, so let me report my answers by taking this question as an example.
import pandas as pd
df = pd.DataFrame()
df['name'] = ['foo', 'foo', 'bar', 'bar', 'foobar', 'foobar']
df['time'] = [5, 2, 5, 6, 20, 1]
grouped = df.groupby('name')
for k, group in grouped:
print(group)
indexes_should_drop = grouped.filter(lambda x: (x['time'].max() <= 5)).index
result1 = df.drop(index=indexes_should_drop)
filter_time_max = grouped['time'].max() > 5
groups_should_keep = filter_time_max.loc[filter_time_max].index
result2 = df.loc[df['name'].isin(groups_should_keep)]
filter_time_max = grouped['time'].max() <= 5
groups_should_drop = filter_time_max.loc[filter_time_max].index
result3 = df.drop(df[df['name'].isin(groups_should_drop)].index)
name time
2 bar 5
3 bar 6
4 foobar 20
5 foobar 1
My Answer1 doesn't use group names to drop groups. If you need group names, you can get them by writing: df.loc[indexes_should_drop].name.unique()
.
grouped['time'].max() <= 5
and grouped.apply(lambda x: (x['time'].max() <= 5)).index
returned same results.
filter_time_max
's index was a group name. It could not be used as an index or label to drop as it is.
name
foo True
bar False
foobar False
Name: time, dtype: bool
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.