[英]Pandas Remove Duplicates from row
我有一個 CSV 文件,該文件在行中有多個重復值。 我想刪除這些重復值,所以我只剩下唯一值。
Dataframe:
1 2 3 4 5 6
Bypass User Account Control T3431 Elevated Execution T3424 Bypass User Account Control T3431
Local Account T3523 Domain Account T4252 Local Account T3523
預期 Dataframe:
1 2 3 4 5 6
Bypass User Account Control T3431 Elevated Execution T3424
Local Account T3523 Domain Account T4252
行中有 100 個重復數據,我只想查看唯一值
使用unique
將每一行轉換為唯一值, output 是array
,因此轉換為Series
:
df1 = df.apply(lambda x: pd.Series(x.unique()), axis=1)
print (df1)
0 1 2 3
0 Bypass User Account Control T3431 Elevated Execution T3424
1 Local Account T3523 Domain Account T4252
或者:
df1 = df.apply(lambda x: x.drop_duplicates().reset_index(drop=True), axis=1)
print (df1)
0 1 2 3
0 Bypass User Account Control T3431 Elevated Execution T3424
1 Local Account T3523 Domain Account T4252
最后用於原始列名稱使用:
df1.columns = df.columns[:len(df1.columns)]
利用
(df.stack()
.groupby(level=0).apply(lambda x: x.drop_duplicates())
.unstack()
.reset_index(drop=True))
結果:
1 2 3 4
0 Bypass User Account Control T3431 Elevated Execution T3424
1 Local Account T3523 Domain Account T4252
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.