在 Pandas Dataframe 中查找相似的行並減去特定的列值

Question

我知道這里有類似的問題和解決方案，但我似乎沒有找到確切的解決方案。

想要找到與“除一之外的所有”列相似的行。

所以，

     ColumnA     ColumnB     ColumnC    ColumnD  ColumnE  
1      John        Texas       USA        115       5
2      Mike        Florida     USA        66        1
3      John        Texas       USA        115       4
4      Justin      NewYork     USA        22        11

所以我試圖得到的邏輯是：

for every entry in the dataframe:
       if there exists "another" entry with all Columns similar, apart from ColumnE
        AND
       the value of ColumnE in First entry found "MINUS" the value of ColumnE in second entry found is "LESS" than "1":
                   Then append the entry to a new DataFrame

到目前為止，我已經使用 df.loc 和 df.duplicated 到達某個地方。 問題和數據有點復雜，所以我可以在這里發布代碼。

對此的任何幫助將不勝感激。

謝謝，羅布

Answer 1

所以我不確定你想要的結果到底是什么格式，所以我制作了一個字典，其中鍵是給定行的索引，值是正好相差 1 個條目的行的索引列表......

def ndif(a,b):
    d = 0
    for x,y in zip(a,b):
            if x!=y:
                    d+=1
    return(d)

d = pd.DataFrame([[1,2,3],[1,2,4],[3,2,4],[3,0,4],[5,0,3]])

just1 = {}

for k in d.index:
    just1[k] = [k[0] for k in d.apply(ndif,args=[d.iloc[k]],axis=1).items() if k[1]==1]

在 Pandas Dataframe 中查找相似的行並減去特定的列值

問題描述

1 個解決方案

解決方案1
0 2020-03-08 01:31:09

在 Pandas Dataframe 中查找相似的行並減去特定的列值

問題描述

1 個解決方案

解決方案1 0 2020-03-08 01:31:09

解決方案1
0 2020-03-08 01:31:09