簡體   English   中英

Pandas 從行中刪除重復項

[英]Pandas Remove Duplicates from row

我有一個 CSV 文件,該文件在行中有多個重復值。 我想刪除這些重復值,所以我只剩下唯一值。

Dataframe:

 1                            2          3                   4           5                              6    
Bypass User Account Control  T3431      Elevated Execution   T3424      Bypass User Account Control    T3431
Local Account                T3523      Domain Account       T4252      Local Account                  T3523

預期 Dataframe:

  1                            2          3                   4           5                              6    
Bypass User Account Control  T3431      Elevated Execution   T3424      
Local Account                T3523      Domain Account       T4252                         

行中有 100 個重復數據,我只想查看唯一值

使用unique將每一行轉換為唯一值, output 是array ,因此轉換為Series

df1 = df.apply(lambda x: pd.Series(x.unique()), axis=1)
print (df1)
                             0      1                   2      3
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

或者:

df1 = df.apply(lambda x: x.drop_duplicates().reset_index(drop=True), axis=1)
print (df1)
                             0      1                   2      3
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

最后用於原始列名稱使用:

df1.columns = df.columns[:len(df1.columns)]

利用

(df.stack()
  .groupby(level=0).apply(lambda x: x.drop_duplicates())
  .unstack()
  .reset_index(drop=True))

結果:

                             1      2                   3      4
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM