Let's suppose I have a dataframe:
import numpy as np
a = [['A',np.nan,2,'x|x|x|y'],['B','a|b',56,'b|c'],['C','c|e|e',65,'f|g'],['D','h',98,'j'],['E','g',98,'k|h'],['F','a|a|a|a|a|b',98,np.nan],['G','w',98,'p'],['H','s',98,'t|u']]
df1 = pd.DataFrame(a, columns=['1', '2','3','4'])
df1
1 2 3 4
0 A NaN 2 x|x|x|y
1 B a|b 56 b|c
2 C c|e|e 65 f|g
3 D h 98 j
4 E g 98 k|h
5 F a|a|a|a|a|b 98 NaN
6 G w 98 p
7 H s 98 t|u
and another dataframe:
a = [['x'],['b'],['h'],['v']]
df2 = pd.DataFrame(a, columns=['1'])
df2
1
0 x
1 b
2 h
3 v
I want to compare column 1 in df2 with column 2 and 4 (splitting it by "|") in df1, and if the value matches with either or both column 2 or 4 (after splitting), I want to extract only those rows of df1 in another dataframe with an added column that will have the value of df2 that matched with either column 2 or column 4 of df1. For example, the result would look something like this:
1 2 3 4 5
0 A NaN 2 x|x|x|y x
1 B a|b 56 b|c b
2 F a|a|a|a|a|b 98 NaN b
3 D h 98 j h
4 E g 98 k|h h
Solution is join values of both columns to Series
in DataFrame.agg
, then splitting by Series.str.split
, filter values in DataFrame.where
with DataFrame.isin
and then join values together without NaN
s, last filter columns without empty strings:
df11 = df1[['2','4']].fillna('').agg('|'.join, 1).str.split('|', expand=True)
df1['5'] = (df11.where(df11.isin(df2['1'].tolist()))
.apply(lambda x: ','.join(set(x.dropna())), axis=1))
df1 = df1[df1['5'].ne('')]
print (df1)
1 2 3 4 5
0 A NaN 2 x|x|x|y x
1 B a|b 56 b|c b
3 D h 98 j h
4 E g 98 k|h h
5 F a|a|a|a|a|b 98 NaN b
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.