簡體   English   中英

如何從dataframe單元格中的格式項中刪除重復項?

[英]How to remove duplicates from the format item in dataframe cell?

我有一個數據幀作為下面的df1 我想從包含-的項目中刪除重復的項目。 例如, 行1行3將分別刪除1A1A2B ,就像df2一樣 如何刪除重復項?

數據幀:

df1 = DataFrame({'Condition': ['1A', '1A, 1A-1A', '1A, 2B', '1A, 2B, 1A-2B', '3C, 1A-2B']})

df1
    Condition
0   1A
1   1A, 1A-1A
2   1A, 2B
3   1A, 2B, 1A-2B
4   3C, 1A-2B

目標輸出:

df2 = DataFrame({'Condition': ['1A', '1A-1A', '1A, 2B', '1A-2B', '3C, 1A-2B']})

df2
    Condition
0   1A
1   1A-1A
2   1A, 2B
3   1A-2B
4   3C, 1A-2B

您CA與價值創造套-和測試,如果分裂值不成套,最后通過加入回來,

L = []
for x in df1['Condition']:
    a = x.split(', ')
    s = set([z for y in a if '-' in y for z in y.split('-')])
    L.append(', '.join([z for z in a if z not in s]))

df1['new'] = L
print (df1)
       Condition        new
0             1A         1A
1      1A, 1A-1A      1A-1A
2         1A, 2B     1A, 2B
3  1A, 2B, 1A-2B      1A-2B
4      3C, 1A-2B  3C, 1A-2B

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM