根據 dataframe 列中的值刪除重復項

Question

Senario         
        


key              associated_keys                                    value      associated_value
KP6070  KP706010/KP706020/KP706030/KP706040/KP706050/KP706060/   AFE.706070.KP    AFE.706010.RT
KP6650  KP706610/KP706620//KP706630/KP706640/KP706650            AFE.706650.KP    AFE.706010.RT

我試過 python 腳本。

Deduptest.groupby(['associated_keys']).max()['associated_value'].reset_index()


Deduptest.drop_duplicates(['associated_value'],keep= 'first')

預計出局

key                     associated_keys                               value    associated_value
KP6070  KP706010/KP706020/KP706030/KP706040/KP706050/KP706060/   AFE.706070.KP    AFE.706010.RT

我正在嘗試根據associated_value列和associated_keys刪除重復項。 如果associated_keys中的值已經存在於該列的任何其他行中，並且對於這兩行，如果associated_value列數據相同，那么我想要其中具有最高長度或更多數據的行。

我嘗試drop_duplicates並嘗試使用長度 function 但我一直在我的 output 中獲取這兩行。

Answer 1

嘗試：

# set up the key to get proper order
df["sort_key"]=df["associated_keys"].str.len()

# sort by that key
df.sort_values("sort_key", inplace=True, ascending=False)

# drop, keeping only the first record (in sorted dataframe, so the one with highest Len)
df.drop_duplicates(subset="associated_value", keep="first", inplace=True)

# drop sort column
df.drop("sort_key", axis=1, inplace=True)

根據 dataframe 列中的值刪除重復項

問題描述

1 個解決方案

解決方案1
1 已采納 2020-07-16 15:34:36

根據 dataframe 列中的值刪除重復項

問題描述

1 個解決方案

解決方案1 1 已采納 2020-07-16 15:34:36

解決方案1
1 已采納 2020-07-16 15:34:36