I have two dataframes with a common column called 'upc' as such:
df1:
upc
23456793749
78907809834
35894796324
67382808404
93743008374
df2:
upc
4567937
9078098
8947963
3828084
7430083
Notice that df2 'upc' values are the innermost 7 values of df1 'upc' values. Note that both df1 and df2 have other columns not shown above. What I want to do is do an inner merge on 'upc' but only on the innermost 7 values. How can I achieve this?
Using str.extact
, match all items in df1 with df2, then we using the result as merge
key merge
with df2
df1['keyfordf2']=df1.astype(str).upc.str.extract(r'({})'.format('|'.join(df2.upc.astype(str).tolist())),expand=True).fillna(False)
df1.merge(df2.astype(str),left_on='keyfordf2',right_on='upc')
Out[273]:
upc_x keyfordf2 upc_y
0 23456793749 4567937 4567937
1 78907809834 9078098 9078098
2 35894796324 8947963 8947963
3 67382808404 3828084 3828084
4 93743008374 7430083 7430083
1) Create both dataframes and convert to string type.
2) pd.merge the two frames, but using the left_on keyword to access the inner 7 characters of your 'upc' series
df1 = pd.DataFrame(data=[
23456793749,
78907809834,
35894796324,
67382808404,
93743008374,], columns = ['upc1'])
df1 = df1.astype(str)
df2 = pd.DataFrame(data=[
4567937,
9078098,
8947963,
3828084,
7430083,], columns = ['upc2'])
df2 = df2.astype(str)
pd.merge(df1, df2, left_on=df1['upc1'].astype(str).str[2:-2], right_on='upc2', how='inner')
Out[5]:
upc1 upc2
0 23456793749 4567937
1 78907809834 9078098
2 35894796324 8947963
3 67382808404 3828084
4 93743008374 7430083
You could make a new column in df1 and merge on that.
import pandas as pd
df1= pd.DataFrame({'upc': [ 23456793749, 78907809834, 35894796324, 67382808404, 93743008374]})
df2= pd.DataFrame({'upc': [ 4567937, 9078098, 8947963, 3828084, 7430083]})
df1['upc_old'] = df1['upc'] #in case you still need the old (longer) upc column
df1['upc'] = df1['upc'].astype(str).str[2:-2].astype(int)
merged_df = pd.merge(df1, df2, on='upc')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.