I have a pandas dataframe like below
text name target
0 str1 name1 1
1 str1 name2 3
2 str1 name2 3
3 str2 name1 2
4 str2 name1 2
5 str2 name1 4
6 str3 name3 3
I need to remove those rows that have only one occurrence of a target class. In this case the row index 0
and 5
I need to remove because 1
and 4
appear only once.
I looked into this post and tried below:
df[df.groupby(['target']).transform('sum') > 1]
But that does not seem to work. Can anyone please suggest?
Hope this suffices: filter out the rows where the target count is not greater than 1
df.groupby('target').filter(lambda x: x.count().gt(1).any())
text name target
1 str1 name2 3
2 str1 name2 3
3 str2 name1 2
4 str2 name1 2
6 str3 name3 3
You can use value_counts
, map back and filter:
print(df[df.target.map(df.target.value_counts()).gt(1)])
Output:
text name target
1 str1 name2 3
2 str1 name2 3
3 str2 name1 2
4 str2 name1 2
6 str3 name3 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.