I have two DataFrames:
First: df1
df1 = {'NAME': ['A','B','C','D'],
'GROUP': ['A1','B1','C1','D1']
}
df1 = pd.DataFrame(df1,columns=['NAME','GROUP'])
NAME GROUP
0 A A1
1 B B1
2 C C1
3 D D1
Second: df2
df2 = {'NAME': ['AA','AAA','AAAA','BB','BBB','BBBB','CC','CCC','CCCC','DD','DDD','DDDD'],
'GROUP': ['','','','','','','','','','','','']
}
df2 = pd.DataFrame(df2,columns=['NAME','GROUP'])
NAME GROUP
0 AA
1 AAA
2 AAAA
3 BB
4 BBB
5 BBBB
6 CC
7 CCC
8 CCCC
9 DD
10 DDD
11 DDDD
My task is set GROUP in df2 according the NAME in df1.
I think I need to use contains : IF df1['NAME'] is in df2['NAME'] set GROUP to that in df1['NAME]. I tried to use a loop and convert the DataFrame into arrays , but it didn't help.
Use Series.str.extract
to create the matching column you can merge on. Then bring the group over. Remove the 'GROUP'
column that already exists before the merge, and I left the 'match'
column in for clarity.
In the case of multiple substring matches, because this uses .str.extract
it will merge with only the first substring match. (Multple matches can be handled with .str.extractall
and some groupby to combine everything into, say, a list.)
pat = '(' + '|'.join(df1['NAME']) +')'
df2['match'] = df2['NAME'].str.extract(pat)
df2 = df2.drop(columns='GROUP').merge(df1.rename(columns={'NAME': 'match'}), how='left')
print(df2)
NAME match GROUP
0 AA A A1
1 AAA A A1
2 AAAA A A1
3 BB B B1
4 BBB B B1
5 BBBB B B1
6 CC C C1
7 CCC C C1
8 CCCC C C1
9 DD D D1
10 DDD D D1
11 DDDD D D1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.