简体   繁体   中英

Drop pandas dataframe rows based on groupby condition

I have a pandas dataframe like below

    text    name    target
0   str1    name1   1
1   str1    name2   3
2   str1    name2   3
3   str2    name1   2
4   str2    name1   2
5   str2    name1   4
6   str3    name3   3

I need to remove those rows that have only one occurrence of a target class. In this case the row index 0 and 5 I need to remove because 1 and 4 appear only once.

I looked into this post and tried below:

df[df.groupby(['target']).transform('sum') > 1]

But that does not seem to work. Can anyone please suggest?

Hope this suffices: filter out the rows where the target count is not greater than 1

df.groupby('target').filter(lambda x: x.count().gt(1).any())

    text    name    target
1   str1    name2   3
2   str1    name2   3
3   str2    name1   2
4   str2    name1   2
6   str3    name3   3

You can use value_counts , map back and filter:

print(df[df.target.map(df.target.value_counts()).gt(1)])

Output:

   text   name  target
1  str1  name2       3
2  str1  name2       3
3  str2  name1       2
4  str2  name1       2
6  str3  name3       3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM