[英]How to remove unique rows in pandas dataframe
index SUBJECT
1 test
2 Hello
3 Hello
4 PRC review - phone calls
AFTER REMOVING 拆卸后
index SUBJECT
2 Hello
3 Hello
I want to delete rows based on only the "SUBJECT" column. 我只想删除基于“ SUBJECT”列的行。 How to do this?
这个怎么做?
Use duplicated
使用
duplicated
Ex: 例如:
import pandas as pd
df = pd.DataFrame({"SUBJECT": ["test", "Hello", "Hello", "PRC review - phone calls"]})
df = df[df.duplicated(subset=["SUBJECT"], keep=False)]
print(df)
Output: 输出:
SUBJECT
1 Hello
2 Hello
You could do: 您可以这样做:
# get count for each value
s = df.SUBJECT.value_counts()
# get only those that appear more than once
repeated = set(s[s > 1].index.values)
# filter the data-frame base
result = df[df.SUBJECT.isin(repeated)]
print(result)
Output 输出量
index SUBJECT
1 2 Hello
2 3 Hello
检查一下:
df.loc[(df.groupby('SUBJECT').count()>1).sum(axis=1),:]
using loc.. 使用loc ..
>>> df.loc[df.duplicated(keep=False), :]
SUBJECT
1 Hello
2 Hello
Another way with groupby + transform .. groupby + 转换的另一种方法..
>>> df[df.groupby('SUBJECT')['SUBJECT'].transform('size') > 1]
SUBJECT
1 Hello
2 Hello
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.