简体   繁体   English

如何使用复杂条件比较两个数据框中的两列

[英]how to compare two column in two dataframes using a complex condition

Let's suppose I have a dataframe:假设我有一个 dataframe:

import numpy as np
a = [['A',np.nan,2,'x|x|x|y'],['B','a|b',56,'b|c'],['C','c|e|e',65,'f|g'],['D','h',98,'j'],['E','g',98,'k|h'],['F','a|a|a|a|a|b',98,np.nan],['G','w',98,'p'],['H','s',98,'t|u']]
df1 = pd.DataFrame(a, columns=['1', '2','3','4'])
df1
    1   2   3   4
0   A   NaN 2   x|x|x|y
1   B   a|b 56  b|c
2   C   c|e|e   65  f|g
3   D   h   98  j
4   E   g   98  k|h
5   F   a|a|a|a|a|b 98  NaN
6   G   w   98  p
7   H   s   98  t|u

and another dataframe:和另一个 dataframe:

a = [['x'],['b'],['h'],['v']]
df2 = pd.DataFrame(a, columns=['1'])
df2

    1
0   x
1   b
2   h
3   v

I want to compare column 1 in df2 with column 2 and 4 (splitting it by "|") in df1, and if the value matches with either or both column 2 or 4 (after splitting), I want to extract only those rows of df1 in another dataframe with an added column that will have the value of df2 that matched with either column 2 or column 4 of df1.我想将 df2 中的第 1 列与 df1 中的第 2 列和第 4 列(用“|”拆分)进行比较,如果该值与第 2 列或第 4 列中的一个或两个匹配(拆分后),我只想提取那些行df1 在另一个 dataframe 中,添加的列将具有与 df1 的第 2 列或第 4 列匹配的 df2 值。 For example, the result would look something like this:例如,结果将如下所示:

    1   2   3   4   5
0   A   NaN 2   x|x|x|y x
1   B   a|b 56  b|c b
2   F   a|a|a|a|a|b 98  NaN b
3   D   h   98  j   h
4   E   g   98  k|h h

Solution is join values of both columns to Series in DataFrame.agg , then splitting by Series.str.split , filter values in DataFrame.where with DataFrame.isin and then join values together without NaN s, last filter columns without empty strings:解决方案是将两列的值连接到DataFrame.agg中的Series ,然后通过Series.str.split NaN ,过滤DataFrame.where中的值。其中DataFrame.isin没有空字符串的列一起过滤,然后将没有空字符串的列过滤在一起。

df11 = df1[['2','4']].fillna('').agg('|'.join, 1).str.split('|', expand=True)
df1['5'] = (df11.where(df11.isin(df2['1'].tolist()))
                .apply(lambda x: ','.join(set(x.dropna())), axis=1))

df1 = df1[df1['5'].ne('')]
print (df1)
   1            2   3        4  5
0  A          NaN   2  x|x|x|y  x
1  B          a|b  56      b|c  b
3  D            h  98        j  h
4  E            g  98      k|h  h
5  F  a|a|a|a|a|b  98      NaN  b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM