Pandas dataframe 比較不同數據幀中不同大小的列

Question

我正在嘗試比較兩個數據框：

df1：

       entry  mass  Precursor  mass_pos
0     KGTLPK   128    642.780   770.780
1     KGTLPK    48    642.780   690.780
2     KGTLPK   112    642.780   754.780
3     KGTLPK    32    642.780   674.780
4     KGTLPK   156    642.780   798.780

df2:

我的目標是找到 df2 'mass' 與 df1 'mass_pos' 的任何匹配項。

我真的很喜歡這個：

df1['masses match'] = np.where(df2['Mass'] == df1['mass_pos'], 'True', 'False')

但這會引發一個值錯誤：

ValueError: Can only compare identically-labeled Series objects

我認為這是因為這些數據幀的行數不同。 有沒有辦法克服這個問題？

Answer 1

使用合並檢查存在的項目。

例子：

df=df1.merge(df2,left_on="mass_pos",right_on="Mass")

代碼：

import pandas as pd

df1 = pd.DataFrame({'entry': ['KGTLPK','KGTLPK','KGTLPK','KGTLPK','KGTLPK' ],
                    'mass': [128 ,48, 112 , 32, 156],
                    'Precursor': [642.780,642.780,642.780,642.780,642.780],
                    'mass_pos': [ 770.780, 690.780, 754.780, 674.780, 798.780]
                    })

df2 = pd.DataFrame({'Mass': [586.672, 798.780, 690.780, 400.000 ]})

df=df1.merge(df2,left_on="mass_pos",right_on="Mass")

print(df)

Output：

    entry  mass  Precursor  mass_pos    Mass
0  KGTLPK    48     642.78    690.78  690.78
1  KGTLPK   156     642.78    798.78  798.78

為了得到預期的結果，我這樣做了：

import pandas as pd

df1 = pd.DataFrame({'entry': ['KGTLPK','KGTLPK','KGTLPK','KGTLPK','KGTLPK' ],
                    'mass': [128 ,48, 112 , 32, 156],
                    'Precursor': [642.780,642.780,642.780,642.780,642.780],
                    'mass_pos': [ 770.780, 690.780, 754.780, 674.780, 798.780]
                    })

df2 = pd.DataFrame({'Mass': [586.672, 798.780, 690.780, 400.000 ]})

DF3 = df1[['mass_pos']].fillna('None')
DF4 = df2[['Mass']].fillna('None')
df1['Merge_Result'] = 'F'
for i in range(DF3.shape[0]):
    for k in range(DF4.shape[0]):        
        if list(DF3.iloc[i]) == list(DF4.iloc[k]):
            df1['Merge_Result'][i] = 'T'
            
print(df1)

Output：

    entry  mass  Precursor  mass_pos Merge_Result
0  KGTLPK   128     642.78    770.78            F
1  KGTLPK    48     642.78    690.78            T
2  KGTLPK   112     642.78    754.78            F
3  KGTLPK    32     642.78    674.78            F
4  KGTLPK   156     642.78    798.78            T

Answer 2

使用np.isclose ：

tolerance = 1e-03
match_mass = lambda x: np.any(np.isclose(x, df2['Mass'], atol=tolerance))
df1['masses match'] = df1['mass_pos'].apply(match_mass)
print(df1)

# Output:
    entry  mass  Precursor  mass_pos  masses match
0  KGTLPK   128     642.78    770.78         False
1  KGTLPK    48     642.78    690.78          True
2  KGTLPK   112     642.78    754.78         False
3  KGTLPK    32     642.78    674.78         False
4  KGTLPK   156     642.78    798.78          True

Pandas dataframe 比較不同數據幀中不同大小的列

問題描述

2 個解決方案

解決方案1
1 2021-12-13 16:48:38

解決方案2
1 已采納 2021-12-13 16:52:48

Pandas dataframe 比較不同數據幀中不同大小的列

問題描述

2 個解決方案

解決方案1 1 2021-12-13 16:48:38

解決方案2 1 已采納 2021-12-13 16:52:48

解決方案1
1 2021-12-13 16:48:38

解決方案2
1 已采納 2021-12-13 16:52:48