簡體   English   中英

Pandas dataframe 比較不同數據幀中不同大小的列

[英]Pandas dataframe compare columns of different sizes in different dataframes

我正在嘗試比較兩個數據框:

df1:

       entry  mass  Precursor  mass_pos
0     KGTLPK   128    642.780   770.780
1     KGTLPK    48    642.780   690.780
2     KGTLPK   112    642.780   754.780
3     KGTLPK    32    642.780   674.780
4     KGTLPK   156    642.780   798.780

df2:

      Mass
0  586.672
1  798.780
2  690.780
3  400.000

我的目標是找到 df2 'mass' 與 df1 'mass_pos' 的任何匹配項。

我真的很喜歡這個:

df1['masses match'] = np.where(df2['Mass'] == df1['mass_pos'], 'True', 'False')

但這會引發一個值錯誤:

ValueError: Can only compare identically-labeled Series objects

我認為這是因為這些數據幀的行數不同。 有沒有辦法克服這個問題?

使用合並檢查存在的項目。

例子:

df=df1.merge(df2,left_on="mass_pos",right_on="Mass")

代碼:

import pandas as pd

df1 = pd.DataFrame({'entry': ['KGTLPK','KGTLPK','KGTLPK','KGTLPK','KGTLPK' ],
                    'mass': [128 ,48, 112 , 32, 156],
                    'Precursor': [642.780,642.780,642.780,642.780,642.780],
                    'mass_pos': [ 770.780, 690.780, 754.780, 674.780, 798.780]
                    })

df2 = pd.DataFrame({'Mass': [586.672, 798.780, 690.780, 400.000 ]})

df=df1.merge(df2,left_on="mass_pos",right_on="Mass")

print(df)

Output:

    entry  mass  Precursor  mass_pos    Mass
0  KGTLPK    48     642.78    690.78  690.78
1  KGTLPK   156     642.78    798.78  798.78

為了得到預期的結果,我這樣做了:

import pandas as pd

df1 = pd.DataFrame({'entry': ['KGTLPK','KGTLPK','KGTLPK','KGTLPK','KGTLPK' ],
                    'mass': [128 ,48, 112 , 32, 156],
                    'Precursor': [642.780,642.780,642.780,642.780,642.780],
                    'mass_pos': [ 770.780, 690.780, 754.780, 674.780, 798.780]
                    })

df2 = pd.DataFrame({'Mass': [586.672, 798.780, 690.780, 400.000 ]})

DF3 = df1[['mass_pos']].fillna('None')
DF4 = df2[['Mass']].fillna('None')
df1['Merge_Result'] = 'F'
for i in range(DF3.shape[0]):
    for k in range(DF4.shape[0]):        
        if list(DF3.iloc[i]) == list(DF4.iloc[k]):
            df1['Merge_Result'][i] = 'T'
            
print(df1)

Output:

    entry  mass  Precursor  mass_pos Merge_Result
0  KGTLPK   128     642.78    770.78            F
1  KGTLPK    48     642.78    690.78            T
2  KGTLPK   112     642.78    754.78            F
3  KGTLPK    32     642.78    674.78            F
4  KGTLPK   156     642.78    798.78            T

使用np.isclose

tolerance = 1e-03
match_mass = lambda x: np.any(np.isclose(x, df2['Mass'], atol=tolerance))
df1['masses match'] = df1['mass_pos'].apply(match_mass)
print(df1)

# Output:
    entry  mass  Precursor  mass_pos  masses match
0  KGTLPK   128     642.78    770.78         False
1  KGTLPK    48     642.78    690.78          True
2  KGTLPK   112     642.78    754.78         False
3  KGTLPK    32     642.78    674.78         False
4  KGTLPK   156     642.78    798.78          True

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM