I have a dataframe that I want to filter based on group size. For example, I want to group by 'Name' and 'Date' and get groups which size is greater than 2.
Name Date Symbol
0 Ajay 2018_Q1 AA
1 Ajay 2018_Q1 BB
2 Ajay 2018_Q1 CC
3 Ajay 2018_Q1 DD
4 Ajay 2019_Q1 AA
5 Faye 2019_Q1 DD
6 Faye 2019_Q1 AA
7 Faye 2019_Q1 ZZ
8 Faye 2018_Q1 AA
9 Faye 2018_Q1 EE
So the output dataframe should look like this:
Name Date Symbol
0 Ajay 2018_Q1 AA
1 Ajay 2018_Q1 BB
2 Ajay 2018_Q1 CC
3 Ajay 2018_Q1 DD
5 Faye 2019_Q1 DD
6 Faye 2019_Q1 AA
7 Faye 2019_Q1 ZZ
How do I achieve this?
You can use the method filter
:
df.groupby(['Name', 'Date']).filter(lambda x: x['Symbol'].size > 2)
or
df.groupby(['Name', 'Date']).filter(lambda x: x.shape[0] > 2)
Output:
Name Date Symbol
0 Ajay 2018_Q1 AA
1 Ajay 2018_Q1 BB
2 Ajay 2018_Q1 CC
3 Ajay 2018_Q1 DD
5 Faye 2019_Q1 DD
6 Faye 2019_Q1 AA
7 Faye 2019_Q1 ZZ
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.