[英]how to compare two column in two dataframes using a complex condition
Let's suppose I have a dataframe:假设我有一个 dataframe:
import numpy as np
a = [['A',np.nan,2,'x|x|x|y'],['B','a|b',56,'b|c'],['C','c|e|e',65,'f|g'],['D','h',98,'j'],['E','g',98,'k|h'],['F','a|a|a|a|a|b',98,np.nan],['G','w',98,'p'],['H','s',98,'t|u']]
df1 = pd.DataFrame(a, columns=['1', '2','3','4'])
df1
1 2 3 4
0 A NaN 2 x|x|x|y
1 B a|b 56 b|c
2 C c|e|e 65 f|g
3 D h 98 j
4 E g 98 k|h
5 F a|a|a|a|a|b 98 NaN
6 G w 98 p
7 H s 98 t|u
and another dataframe:和另一个 dataframe:
a = [['x'],['b'],['h'],['v']]
df2 = pd.DataFrame(a, columns=['1'])
df2
1
0 x
1 b
2 h
3 v
I want to compare column 1 in df2 with column 2 and 4 (splitting it by "|") in df1, and if the value matches with either or both column 2 or 4 (after splitting), I want to extract only those rows of df1 in another dataframe with an added column that will have the value of df2 that matched with either column 2 or column 4 of df1.我想将 df2 中的第 1 列与 df1 中的第 2 列和第 4 列(用“|”拆分)进行比较,如果该值与第 2 列或第 4 列中的一个或两个匹配(拆分后),我只想提取那些行df1 在另一个 dataframe 中,添加的列将具有与 df1 的第 2 列或第 4 列匹配的 df2 值。 For example, the result would look something like this:
例如,结果将如下所示:
1 2 3 4 5
0 A NaN 2 x|x|x|y x
1 B a|b 56 b|c b
2 F a|a|a|a|a|b 98 NaN b
3 D h 98 j h
4 E g 98 k|h h
Solution is join values of both columns to Series
in DataFrame.agg
, then splitting by Series.str.split
, filter values in DataFrame.where
with DataFrame.isin
and then join values together without NaN
s, last filter columns without empty strings:解决方案是将两列的值连接到
DataFrame.agg
中的Series
,然后通过Series.str.split
NaN
,过滤DataFrame.where
中的值。其中DataFrame.isin
没有空字符串的列一起过滤,然后将没有空字符串的列过滤在一起。
df11 = df1[['2','4']].fillna('').agg('|'.join, 1).str.split('|', expand=True)
df1['5'] = (df11.where(df11.isin(df2['1'].tolist()))
.apply(lambda x: ','.join(set(x.dropna())), axis=1))
df1 = df1[df1['5'].ne('')]
print (df1)
1 2 3 4 5
0 A NaN 2 x|x|x|y x
1 B a|b 56 b|c b
3 D h 98 j h
4 E g 98 k|h h
5 F a|a|a|a|a|b 98 NaN b
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.