I have performed a group by in the pandas dataframe to see how many rows are there for each location and each date.
agg_count = df.groupby(['date', 'location']).count()
Now I want to see the rows of this new dataframe that satisfy a particular condition. Say, count is greater than 50. How do I iterate over this huge dataframe efficiently to get those rows?
Starting with this data
In [275]: df = pd.DataFrame({'date': [20130101, 20130101, 20130102], 'location': ['a', 'a', 'c']})
In [276]: df
Out[276]:
date location
0 20130101 a
1 20130101 a
2 20130102 c
This selects columns that have a count > 1
In [277]: df.groupby(['date', 'location']).apply(lambda sdf: sdf if len(sdf) > 1 else None)
Out[277]:
date location
date location
20130101 a 0 20130101 a
1 20130101 a
Dropping multi-index below
In [278]: df.groupby(['date', 'location']).apply(lambda sdf: sdf if len(sdf) > 1 else None).reset_index(drop=True)
Out[278]:
date location
0 20130101 a
1 20130101 a
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.