简体   繁体   English

在 2 个数据帧中查找并匹配字符串 python pandas

[英]Find and match a string in 2 dataframes python pandas

I have two dataframes ( df1 , df2 ) and I would like to create a new column in df1 that find "Slave" in df2 with "Name" in df1 and insert "FullName" of that row in same row in df1 .我有两个数据框( df1df2 ),我想在df1中创建一个新列,在df2中找到"Slave" ,在df1中使用"Name" ,并在df1的同一行中插入该行的"FullName"

>>> df2

SN  Slave                  Add  FullName    
0   21010730236TJ5900031    1   1.1.1   
1   21010730236TJ5902800    2   1.1.2   
2   21010730236TJ5902787    3   1.1.3   
3   21010730236TJ5902784    4   1.1.4   
>>> df1

SN        num   Name                        
0         #INV1 ESN:21010730236TJ5902772
1         #INV3 ESN:21010730236TJ5902787
2         #INV5 ESN:21010730236TJ5902785
3         #INV2 ESN:21010730236TJ5902800
4         #INV4 ESN:21010730236TJ5902784

Thank you谢谢

import pandas as pd
data1 =[[0,         "#INV1", "ESN:21010730236TJ5902772"],
[1,        "#INV3", "ESN:21010730236TJ5902787"],
[2,        "#INV5", "ESN:21010730236TJ5902785"],
[3,         "#INV2", "ESN:21010730236TJ5902800"],
[4,         "#INV4", "ESN:21010730236TJ5902784"]]

data2=[[0,   "21010730236TJ5900031",    1,   "1.1.1"],   
[1,   "21010730236TJ5902800",    2,   "1.1.2"],   
[2,   "21010730236TJ5902787",    3,   "1.1.3"],   
[3,   "21010730236TJ5902784",    4,   "1.1.4"]]

df1 = pd.DataFrame(data1, columns=["SN","num","Name"])
df2 = pd.DataFrame(data2, columns=["SN","Slave","Add","FullName"])

You can do the following:您可以执行以下操作:

#remove the ESN: from df1
df1["Name"]=df1["Name"].str.replace("ESN:","") 

#make a new DataFrame merging the df1 and df2
result =pd.merge(left=df1, right=df2, how='left', left_on='Name', right_on='Slave')

#drop the columns of the df2 from the result DataFrame
result.drop(['SN_y', 'Add',"Slave"], axis=1, inplace=True)

#rename the columns of the result DataFrame to match the df1
result.columns=["SN","num","Name","FullName"]

#adding the ESN: again
result["Name"]="ESN:"+result["Name"]

This is how the result DataFrame will look like (check how you want your index):这就是result DataFrame 的样子(检查你想要的索引):

    SN  num     Name                        FullName
0   0   #INV1   ESN:21010730236TJ5902772    NaN
1   1   #INV3   ESN:21010730236TJ5902787    1.1.3
2   2   #INV5   ESN:21010730236TJ5902785    NaN
3   3   #INV2   ESN:21010730236TJ5902800    1.1.2
4   4   #INV4   ESN:21010730236TJ5902784    1.1.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM