![](/img/trans.png)
[英]pandas: compare string columns from two different dataframes of different sizes
[英]Pandas dataframe compare columns of different sizes in different dataframes
我正在嘗試比較兩個數據框:
df1:
entry mass Precursor mass_pos
0 KGTLPK 128 642.780 770.780
1 KGTLPK 48 642.780 690.780
2 KGTLPK 112 642.780 754.780
3 KGTLPK 32 642.780 674.780
4 KGTLPK 156 642.780 798.780
df2:
Mass
0 586.672
1 798.780
2 690.780
3 400.000
我的目標是找到 df2 'mass' 與 df1 'mass_pos' 的任何匹配項。
我真的很喜歡這個:
df1['masses match'] = np.where(df2['Mass'] == df1['mass_pos'], 'True', 'False')
但這會引發一個值錯誤:
ValueError: Can only compare identically-labeled Series objects
我認為這是因為這些數據幀的行數不同。 有沒有辦法克服這個問題?
使用合並檢查存在的項目。
例子:
df=df1.merge(df2,left_on="mass_pos",right_on="Mass")
代碼:
import pandas as pd
df1 = pd.DataFrame({'entry': ['KGTLPK','KGTLPK','KGTLPK','KGTLPK','KGTLPK' ],
'mass': [128 ,48, 112 , 32, 156],
'Precursor': [642.780,642.780,642.780,642.780,642.780],
'mass_pos': [ 770.780, 690.780, 754.780, 674.780, 798.780]
})
df2 = pd.DataFrame({'Mass': [586.672, 798.780, 690.780, 400.000 ]})
df=df1.merge(df2,left_on="mass_pos",right_on="Mass")
print(df)
Output:
entry mass Precursor mass_pos Mass
0 KGTLPK 48 642.78 690.78 690.78
1 KGTLPK 156 642.78 798.78 798.78
為了得到預期的結果,我這樣做了:
import pandas as pd
df1 = pd.DataFrame({'entry': ['KGTLPK','KGTLPK','KGTLPK','KGTLPK','KGTLPK' ],
'mass': [128 ,48, 112 , 32, 156],
'Precursor': [642.780,642.780,642.780,642.780,642.780],
'mass_pos': [ 770.780, 690.780, 754.780, 674.780, 798.780]
})
df2 = pd.DataFrame({'Mass': [586.672, 798.780, 690.780, 400.000 ]})
DF3 = df1[['mass_pos']].fillna('None')
DF4 = df2[['Mass']].fillna('None')
df1['Merge_Result'] = 'F'
for i in range(DF3.shape[0]):
for k in range(DF4.shape[0]):
if list(DF3.iloc[i]) == list(DF4.iloc[k]):
df1['Merge_Result'][i] = 'T'
print(df1)
Output:
entry mass Precursor mass_pos Merge_Result
0 KGTLPK 128 642.78 770.78 F
1 KGTLPK 48 642.78 690.78 T
2 KGTLPK 112 642.78 754.78 F
3 KGTLPK 32 642.78 674.78 F
4 KGTLPK 156 642.78 798.78 T
使用np.isclose
:
tolerance = 1e-03
match_mass = lambda x: np.any(np.isclose(x, df2['Mass'], atol=tolerance))
df1['masses match'] = df1['mass_pos'].apply(match_mass)
print(df1)
# Output:
entry mass Precursor mass_pos masses match
0 KGTLPK 128 642.78 770.78 False
1 KGTLPK 48 642.78 690.78 True
2 KGTLPK 112 642.78 754.78 False
3 KGTLPK 32 642.78 674.78 False
4 KGTLPK 156 642.78 798.78 True
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.