简体   繁体   中英

Filter dataframe by group size

I have a dataframe that I want to filter based on group size. For example, I want to group by 'Name' and 'Date' and get groups which size is greater than 2.

   Name     Date Symbol
0  Ajay  2018_Q1     AA
1  Ajay  2018_Q1     BB
2  Ajay  2018_Q1     CC
3  Ajay  2018_Q1     DD
4  Ajay  2019_Q1     AA
5  Faye  2019_Q1     DD
6  Faye  2019_Q1     AA
7  Faye  2019_Q1     ZZ
8  Faye  2018_Q1     AA
9  Faye  2018_Q1     EE

So the output dataframe should look like this:

   Name     Date Symbol
0  Ajay  2018_Q1     AA
1  Ajay  2018_Q1     BB
2  Ajay  2018_Q1     CC
3  Ajay  2018_Q1     DD
5  Faye  2019_Q1     DD
6  Faye  2019_Q1     AA
7  Faye  2019_Q1     ZZ

How do I achieve this?

You can use the method filter :

df.groupby(['Name', 'Date']).filter(lambda x: x['Symbol'].size > 2)

or

df.groupby(['Name', 'Date']).filter(lambda x: x.shape[0] > 2)

Output:

   Name     Date Symbol
0  Ajay  2018_Q1     AA
1  Ajay  2018_Q1     BB
2  Ajay  2018_Q1     CC
3  Ajay  2018_Q1     DD
5  Faye  2019_Q1     DD
6  Faye  2019_Q1     AA
7  Faye  2019_Q1     ZZ

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM