Drop pandas dataframe rows based on groupby condition

Question

I have a pandas dataframe like below

    text    name    target
0   str1    name1   1
1   str1    name2   3
2   str1    name2   3
3   str2    name1   2
4   str2    name1   2
5   str2    name1   4
6   str3    name3   3

I need to remove those rows that have only one occurrence of a target class. In this case the row index 0 and 5 I need to remove because 1 and 4 appear only once.

I looked into this post and tried below:

df[df.groupby(['target']).transform('sum') > 1]

But that does not seem to work. Can anyone please suggest?

Answer 1

Hope this suffices: filter out the rows where the target count is not greater than 1

df.groupby('target').filter(lambda x: x.count().gt(1).any())

    text    name    target
1   str1    name2   3
2   str1    name2   3
3   str2    name1   2
4   str2    name1   2
6   str3    name3   3

Answer 2

You can use value_counts , map back and filter:

print(df[df.target.map(df.target.value_counts()).gt(1)])

Output:

   text   name  target
1  str1  name2       3
2  str1  name2       3
3  str2  name1       2
4  str2  name1       2
6  str3  name3       3

Drop pandas dataframe rows based on groupby condition

Question

2 answers

solution1
1 2020-02-28 03:24:33

solution2
1 ACCPTED 2020-02-28 03:33:16

Drop pandas dataframe rows based on groupby condition

Question

2 answers

solution1 1 2020-02-28 03:24:33

solution2 1 ACCPTED 2020-02-28 03:33:16

solution1
1 2020-02-28 03:24:33

solution2
1 ACCPTED 2020-02-28 03:33:16