簡體   English   中英

如何刪除熊貓數據框中的唯一行

[英]How to remove unique rows in pandas dataframe

index                                            SUBJECT
1                                                   test
2                                                  Hello
3                                                  Hello
4                               PRC review - phone calls

拆卸后

index                                            SUBJECT
2                                                  Hello
3                                                  Hello

我只想刪除基於“ SUBJECT”列的行。 這個怎么做?

使用duplicated

例如:

import pandas as pd

df = pd.DataFrame({"SUBJECT": ["test", "Hello", "Hello", "PRC review - phone calls"]})
df = df[df.duplicated(subset=["SUBJECT"], keep=False)]
print(df)

輸出:

  SUBJECT
1   Hello
2   Hello

您可以這樣做:

# get count for each value
s = df.SUBJECT.value_counts()

# get only those that appear more than once
repeated = set(s[s > 1].index.values)

# filter the data-frame base
result = df[df.SUBJECT.isin(repeated)]

print(result)

輸出量

   index SUBJECT
1      2   Hello
2      3   Hello

檢查一下:

df.loc[(df.groupby('SUBJECT').count()>1).sum(axis=1),:]

解決方案1:

使用loc ..

>>> df.loc[df.duplicated(keep=False), :]
  SUBJECT
1   Hello
2   Hello

解決方案2:

groupby + 轉換的另一種方法..

>>> df[df.groupby('SUBJECT')['SUBJECT'].transform('size') > 1]
  SUBJECT
1   Hello
2   Hello

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM