[英]How to remove unique rows in pandas dataframe
index SUBJECT
1 test
2 Hello
3 Hello
4 PRC review - phone calls
拆卸后
index SUBJECT
2 Hello
3 Hello
我只想刪除基於“ SUBJECT”列的行。 這個怎么做?
使用duplicated
例如:
import pandas as pd
df = pd.DataFrame({"SUBJECT": ["test", "Hello", "Hello", "PRC review - phone calls"]})
df = df[df.duplicated(subset=["SUBJECT"], keep=False)]
print(df)
輸出:
SUBJECT
1 Hello
2 Hello
您可以這樣做:
# get count for each value
s = df.SUBJECT.value_counts()
# get only those that appear more than once
repeated = set(s[s > 1].index.values)
# filter the data-frame base
result = df[df.SUBJECT.isin(repeated)]
print(result)
輸出量
index SUBJECT
1 2 Hello
2 3 Hello
檢查一下:
df.loc[(df.groupby('SUBJECT').count()>1).sum(axis=1),:]
使用loc ..
>>> df.loc[df.duplicated(keep=False), :]
SUBJECT
1 Hello
2 Hello
groupby + 轉換的另一種方法..
>>> df[df.groupby('SUBJECT')['SUBJECT'].transform('size') > 1]
SUBJECT
1 Hello
2 Hello
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.