Pandas 從行中刪除重復項

Question

我有一個 CSV 文件，該文件在行中有多個重復值。 我想刪除這些重復值，所以我只剩下唯一值。

Dataframe：

 1                            2          3                   4           5                              6    
Bypass User Account Control  T3431      Elevated Execution   T3424      Bypass User Account Control    T3431
Local Account                T3523      Domain Account       T4252      Local Account                  T3523

預期 Dataframe：

  1                            2          3                   4           5                              6    
Bypass User Account Control  T3431      Elevated Execution   T3424      
Local Account                T3523      Domain Account       T4252

行中有 100 個重復數據，我只想查看唯一值

Answer 1

使用unique將每一行轉換為唯一值， output 是array ，因此轉換為Series ：

df1 = df.apply(lambda x: pd.Series(x.unique()), axis=1)
print (df1)
                             0      1                   2      3
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

或者：

df1 = df.apply(lambda x: x.drop_duplicates().reset_index(drop=True), axis=1)
print (df1)
                             0      1                   2      3
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

最后用於原始列名稱使用：

df1.columns = df.columns[:len(df1.columns)]

Answer 2

利用

(df.stack()
  .groupby(level=0).apply(lambda x: x.drop_duplicates())
  .unstack()
  .reset_index(drop=True))

結果：

                             1      2                   3      4
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

Pandas 從行中刪除重復項

問題描述

2 個解決方案

解決方案1
1 已采納 2021-02-03 10:52:17

解決方案2
1 2021-02-03 10:53:47

Pandas 從行中刪除重復項

問題描述

2 個解決方案

解決方案1 1 已采納 2021-02-03 10:52:17

解決方案2 1 2021-02-03 10:53:47

解決方案1
1 已采納 2021-02-03 10:52:17

解決方案2
1 2021-02-03 10:53:47