[英]Match string values in two different DataFrames and create a new column with match indicator in Pandas
I have two dataframes (df1, df2) and I would like to create a new column in df1 that indicates if there is a match in the code columns between each dataframe. 我有两个数据帧(df1,df2),我想在df1中创建一个新列,该列指示每个数据帧之间的代码列是否匹配。 The code column in df2 is made up of strings separated by a comma.
df2中的代码列由用逗号分隔的字符串组成。
df1 DF1
Date Code
2016-01-01 LANH08
2016-01-01 LAOH07
2016-01-01 LAPH09
2016-01-01 LAQH06
2016-01-01 LARH03
df2 DF2
Date Code
2016-01-01 LANH08, LAOH07, LXA0EW, LAGRL1
2016-01-01 LAUH02, LAVH00, LAVH01, LAYH00
2016-01-01 LANH08
2016-01-01 AAH00, ABH00, XAH03
2016-01-01 ARH04, BA0BW, BMH01, DPH00
My Goal 我的目标
df1 DF1
Date Code Match
2016-01-01 LANH08 Y
2016-01-01 LAOH07 Y
2016-01-01 LAPH09 N
2016-01-01 LAQH06 N
2016-01-01 LARH03 N
#Split df2['Code'] into an array
df2.Code = df2.Code.str.split(', ')
#Recreate df2 reshaped
df2 = pd.concat([pd.DataFrame(dict(list(zip(df2.columns,df2.ix[i]))),\
index=range(len(list(zip(df2.columns,df2.ix[i]))[1]))) for i in range(len(df2.index))])
#default df2['match'] to 'Y'
df2['Match'] = 'Y'
#Create new dataframe by left merging df1 with df2
df3 = df1.merge(df2, left_on = ['Date','Code'], right_on = ['Date','Code'], how = 'left')
#Fill NaN values in Match column with 'N' (because they weren't in df2)
df3['Match'] = df3['Match'].fillna('N')
Final Solution: 最终解决方案:
data1 = {'Date':['2016-01-01',
'2016-01-01',
'2016-01-01',
'2016-01-01',
'2016-01-01'],
'Code':['LANH08',
'LAOH07',
'LAPH09',
'LAQH06',
'LARH03']}
df1 = DataFrame(data1)
data2 = {'Date':['2016-01-01',
'2016-01-01',
'2016-01-01',
'2016-01-01',
'2016-01-01'],
'Code':['LANH08, LAOH07, LXA0EW, LAGRL1',
'LAUH02, LAVH00, LAVH01, LAYH00',
'LANH08',
'AAH00, ABH00, XAH03',
'LAUH02, LAVH00']}
df2 = DataFrame(data2)
df2 = DataFrame(df2.Code.str.split(', ').tolist(), index=df2.Date).stack().drop_duplicates()
df2 = df2.reset_index()[[0, 'Date']] # Code variable is currently labeled 0
df2.columns = ['Code', 'Date'] # Renaming Code
# default df2['match'] to 'Y'
df2['Match'] = 'Y'
# Create new dataframe by left merging df1 with df2
df3 = df1.merge(df2, left_on = ['Code', 'Date'], right_on = ['Code', 'Date'], how = 'left')
# Fill NaN values in Match column with 'N' (because they weren't in df2)
df3['Match'] = df3['Match'].fillna('N')
df3
Code Date Match
0 LANH08 2016-01-01 Y
1 LAOH07 2016-01-01 Y
2 LAPH09 2016-01-01 N
3 LAQH06 2016-01-01 N
4 LARH03 2016-01-01 N
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.