[英]ValueError: You are trying to merge on object and int64 columns when use pandas merge
[英]Error when trying to use df.merge: "You are trying to merge on object and int64 columns"
我目前正在嘗試編寫一個程序,該程序采用化合物的標識符(稱為 CID 編號),然后使用 pubchempy 文檔返回化合物的屬性。
但是,當我嘗試將從 pubchempy 獲得的數據值合並到初始數據庫時,我不斷收到錯誤消息。
這是我現在寫的代碼:
import pandas as pd
import pubchempy
import numpy as np
df = pd.read_csv("Data.tsv.txt", sep="\t")
from pubchempy import get_properties
df['CID'] = df['CID'].astype(str).apply(lambda x: x.replace('.0',''))
df['CID'] = df['CID'].astype(str).apply(lambda x: x.replace('0',''))
df = df.drop(df[df.CID=='nan'].index)
df = df.drop(labels='reference', axis=1)
df = df.drop(labels='group', axis=1)
df = df.drop(labels='comments', axis=1)
df = df.drop(labels='compound_name', axis=1)
props = ['HBondDonorCount', 'RotatableBondCount', 'MolecularWeight', 'HBondAcceptorCount']
df2 = pd.DataFrame(get_properties(identifier=df.CID.to_list(), properties=props))
df = df.merge(df2)
print(df)
但是,我收到一條錯誤消息,上面寫着,
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
有誰知道如何解決這一問題?
幾行文本文件(數據文件):
NO. compound_name IUPAC_name SMILES CID Inchi threshold reference group comments
1 sulphasalazine 2-hydroxy-5-[[4-(pyridin-2-ylsulfamoyl)phenyl]diazenyl]benzoic acid O=C(O)c1cc(N=Nc2ccc(S(=O)(=O)Nc3ccccn3)cc2)ccc1O 5339 InChI=1S/C18H14N4O5S/c23-16-9-6-13(11-15(16)18(24)25)21-20-12-4-7-14(8-5-12)28(26,27)22-17-3-1-2-10-19-17/h1-11,23H,(H,19,22)(H,24,25) R2|R2|R25|R46| A
2 moxalactam 7-[[2-carboxy-2-(4-hydroxyphenyl)acetyl]amino]-7-methoxy-3-[(1-methyltetrazol-5-yl)sulfanylmethyl]-8-oxo-5-oxa-1-azabicyclo[4.2.0]oct-2-ene-2-carboxylic acid COC1(NC(=O)C(C(=O)O)c2ccc(O)cc2)C(=O)N2C(C(=O)O)=C(CSc3nnnn3C)COC21 3889 InChI=1S/C20H20N6O9S/c1-25-19(22-23-24-25)36-8-10-7-35-18-20(34-2,17(33)26(18)13(10)16(31)32)21-14(28)12(15(29)30)9-3-5-11(27)6-4-9/h3-6,12,18,27H,7-8H2,1-2H3,(H,21,28)(H,29,30)(H,31,32) R25| A
3 clioquinol 5-chloro-7-iodoquinolin-8-ol Oc1c(I)cc(Cl)c2cccnc12 2788 InChI=1S/C9H5ClINO/c10-6-4-7(11)9(13)8-5(6)2-1-3-12-8/h1-4,13H R18|R26|R27| A
df2 output 的幾行:
CID MolecularWeight HBondDonorCount HBondAcceptorCount RotatableBondCount
0 5339 398.4 3 9 6
1 3889 520.5 4 13 9
2 2788 305.50 1 2 0
3 1422517 440.5 0 8 4
4 18595497 461.5 5 10 3
您似乎想在CID
列上合並兩個數據框。 df2
的CID
列類型是int
,你需要將其更改為 object 以匹配df
中的CID
類型
df = df.merge(df2.astype({'CID': str}), on='CID')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.