是否有 function 使用 Python 刪除行內的重復項而不刪除整行？

Question

import pandas as pd

data=[["John","Alzheimer's","Infection","Alzheimer's"],["Kevin","Pneumonia","Pneumonia","Tuberculosis"]]
df=pd.DataFrame(data,columns=['Name','Problem1','Problem2','Problem3'])

在這個數據框中，我想通讀每一行並刪除重復項，以便每個人的問題只報告一次。 這意味着刪除“阿爾茨海默氏症”作為第 1 行中的重復項。我嘗試了 drop_duplicates() function 但這會刪除整行。

任何幫助，將不勝感激！

Answer 1

首先重新創建一個數據示例：

import pandas as pd
data=[["John","Alzheimer's","Infection","Alzheimer's"],["Kevin","Pneumonia","Pneumonia","Tuberculosis"]]
df=pd.DataFrame(data,columns=['Name','Problem1','Problem2','Problem3'])

df

現在用空白空間刪除或替換重復項：

df['Problem2']=df.apply(lambda x:x["Problem2"] if not(x["Problem2"]==x['Problem1']) else " ",axis=1)


df['Problem3']=df.apply(lambda x:x["Problem3"] if not(x["Problem3"]==x['Problem2'] or x["Problem3"]==x['Problem1']) else " ",axis=1)
df

Answer 2

您可以嘗試為此使用df.duplicated 。 這類似於df.drop_duplicates但返回 boolean 系列而不是刪除重復項。 然后，您可以通過此 boolean 系列將值設置為 None 來索引您的初始 dataframe。

Answer 3

使用apply和duplicated 。

確保在apply上使用axis=1參數來應用於行而不是列。 duplicated將返回 boolean 系列，默認情況下將第一次出現設置為“假”。 將本系列的反面與~一起使用將保留我們的非重復值並忽略重復的值。

示例的設置

import pandas as pd

data=[["John","Alzheimer's","Infection","Alzheimer's"],["Kevin","Pneumonia","Pneumonia","Tuberculosis"]]
df=pd.DataFrame(data,columns=['Name','Problem1','Problem2','Problem3'])

df
    Name     Problem1   Problem2      Problem3
0   John  Alzheimer's  Infection   Alzheimer's
1  Kevin    Pneumonia  Pneumonia  Tuberculosis

重復數據刪除

deduped_df = df.apply(lambda row: row[~row.duplicated()],axis=1)

output

>>> deduped_df
    Name     Problem1   Problem2      Problem3
0   John  Alzheimer's  Infection           NaN
1  Kevin    Pneumonia        NaN  Tuberculosis

Answer 4

我不會使用寬樣式的數據框。 我會把它變成長的。 因此：

data = [["John", "Alzheimer's", "Infection", "Alzheimer's"],
        ["Kevin", "Pneumonia", "Pneumonia", "Tuberculosis"]]
df = pd.DataFrame(data, columns=['Name', 'Problem1', 'Problem2', 'Problem3'])

df.rename(columns=str.lower, inplace=True)  # personal preference
long_df = pd.wide_to_long(df, 'problem', i='name', j='index').sort_index()

這會產生一個如下所示的表：

                  problem
name  index              
John  1       Alzheimer's
      2         Infection
      3       Alzheimer's
Kevin 1         Pneumonia
      2         Pneumonia
      3      Tuberculosis

然后你可以像往常一樣去重復：

>>> long_df.reset_index().drop_duplicates(['name', 'problem'])
    name  index       problem
0   John      1   Alzheimer's
1   John      2     Infection
3  Kevin      1     Pneumonia
5  Kevin      3  Tuberculosis

如果這是一個真正的問題並且您確實需要寬格式，您可以只df.unstack()然后重命名問題索引多索引，以將其轉換回具有與以前相同格式的寬數據框。

是否有 function 使用 Python 刪除行內的重復項而不刪除整行？

問題描述

4 個解決方案

解決方案1
0 2022-08-12 15:04:02

解決方案2
0 2022-08-12 15:05:02

解決方案3
0 2022-08-12 22:30:50

解決方案4
0 2022-08-13 20:57:12

是否有 function 使用 Python 刪除行內的重復項而不刪除整行？

問題描述

4 個解決方案

解決方案1 0 2022-08-12 15:04:02

解決方案2 0 2022-08-12 15:05:02

解決方案3 0 2022-08-12 22:30:50

解決方案4 0 2022-08-13 20:57:12

解決方案1
0 2022-08-12 15:04:02

解決方案2
0 2022-08-12 15:05:02

解決方案3
0 2022-08-12 22:30:50

解決方案4
0 2022-08-13 20:57:12