[英]How to merge two columns of different data frame and if the match is found write "True" in a new column using pandas
I'm working on pandas project.我正在从事 pandas 项目。 I have two data frame similar to bellow
我有两个类似于波纹管的数据框
DF1 :
Data1 Data2 Data3
Head Cat Fire
Limbs Dog Snow
Eyes Fish Water
Mouth Dragon Air
DF2 :
Data1 Data2
Limbs Dog
Mouth Dragon
Head Cat
Based on the above Dataframe I need to compare both DF's and if the match is found I need to write "True" in a separate column else False基于以上 Dataframe 我需要比较两个 DF,如果找到匹配我需要在单独的列中写“True”,否则为 False
ex: lets say, I pick DF2 first row with combination (Limbs, Dog) this should be searched in DF1, as we can see the combination exits in the 2nd row, then write DF1's Data3 value "Snow" to the DF2 Data3 value.例如:可以说,我选择了 DF2 第一行的组合(肢体,狗)这应该在 DF1 中搜索,因为我们可以看到第二行的组合存在,然后将 DF1 的 Data3 值“Snow”写入 DF2 Data3 值。 and also print "True" value in a new column if the match is found.
如果找到匹配项,还会在新列中打印“True”值。
expected output预计 output
Data1 Data2 Data3 Data4
Limbs Dog Snow True
Mouth Dragon Air True
Head cat Fire True
Eyes Fish Water False
Currently, I have tried merging two dataframe目前,我尝试合并两个 dataframe
Current Code:当前代码:
df3 = pd.merge(df, valid_req , on=['Data1','Data2' ])
df3
Data1 Data2 Data3
Limbs Dog Snow
Mouth Dragon Air
Head cat Fire
How can I achieve the expected output?我怎样才能达到预期的output?
You can assign a temporary column to df2
and then merge
using how='left'
:您可以为
df2
分配一个临时列,然后使用how='left'
merge
:
In [1665]: df2['tmp'] = 1
In [1668]: x = df1.merge(df2, on=['Data1', 'Data2'], how='left')
In [1667]: x
Out[1667]:
Data1 Data2 Data3 tmp
0 Head Cat Fire 1.0
1 Limbs Dog Snow 1.0
2 Eyes Fish Water NaN
3 Mouth Dragon Air 1.0
Finally, use numpy.where
to assign the new column Data4
based on if x['tmp'] == 1
then True
, else False
:最后,使用
numpy.where
分配新列Data4
基于 if x['tmp'] == 1
then True
, else False
:
In [1668]: import numpy as np
In [1669]: x['Data4'] = np.where(x.tmp.eq(1), True, False)
Drop the unnecessary tmp
column using df.drop
.使用
df.drop
删除不必要的tmp
列。 Then x
is your final output :然后
x
是你的最终 output :
In [1671]: x.drop('tmp', 1, inplace=True)
In [1672]: x
Out[1672]:
Data1 Data2 Data3 Data4
0 Head Cat Fire True
1 Limbs Dog Snow True
2 Eyes Fish Water False
3 Mouth Dragon Air True
Use DataFrame.merge
with left join and indicator=True
parameter and then for new column compare by both
with DataFrame.pop
for remove column:使用
DataFrame.merge
和 left join 和indicator=True
参数,然后将新列与both
进行DataFrame.pop
以删除列:
df = df1.merge(df2, on=['Data1', 'Data2'], how='left', indicator=True)
df['Data4'] = df.pop('_merge').eq('both')
print (df)
Data1 Data2 Data3 Data4
0 Head Cat Fire True
1 Limbs Dog Snow True
2 Eyes Fish Water False
3 Mouth Dragon Air True
Use simply the apply function on DF1 to create the Data4:只需在 DF1 上使用 apply function 即可创建 Data4:
import pandas as pd
DF1 = pd.DataFrame([
["Head", "Cat", "Fire"],
["Limbs", "Dog", "Snow"],
["Eyes", "Fish", "Water"],
["Mouth", "Dragon", "Air"]
], columns=["Data1", "Data2", "Data3"])
DF2 = pd.DataFrame([
["Limbs", "Dog", "Snow"],
["Mouth", "Dragon", "Air"],
["Head", "Cat", "Fire"]
], columns=["Data1", "Data2", "Data3"])
DF1["Data4"] = DF1["Data1"].apply(lambda cell: DF2[DF2["Data1"]==cell]["Data1"].count()>0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.