简体   繁体   English

根据与熊猫的部分匹配来合并列

[英]Merge columns based on partial match with pandas

I have 2 dfs that I want to merge them by: 我有2个DFS,我想通过以下方式合并它们:
- Exact match on column X . -在X列上完全匹配。
- Numbers in Y and Z in pdf should be within range of those in odf , even if only partially. -数字在YZpdf应在那些范围odf ,即使只是部分。

#odf
      X     Y    Z 
b1    s1    3    19
b2    s1    5    300
b4    s3    500  550
b6    s5    5    25

#pdf
      X     Y    Z
d3    s2    7    12   #wrong s
d6    s1    50   220  #match b2 above 
d7    s3    503  509  #match b4 above
d16   s5    15   30   #accept match to b6, partial match in Y/Z.
d18   s5    4    15   #accept match to b6  

In this case, I would get: 在这种情况下,我会得到:

#iodf and ipdf are indices of the two dfs above
iodf    X     Yodf    Zodf   ipdf    Ypdf   Zpdf
b2      s1    5       300    d6      50     220   
b4      s3    500     550    d7      503    509
b6      s5    5       25     d16     15     30 
b6      s5    5       25     d18     4      15

I was thinking about creating an additional column with a regex in each df, and merging them based on that regex. 我正在考虑在每个df中创建一个带有正则表达式的附加列,并根据该正则表达式合并它们。

odf.loc[:,'id']=odf.X+'\\_`+odf.Y.astype(str)+'\\_`+odf.Z.astype(str)
pdf.loc[:,'id']=pdf.X+'\\_`+pdf.Y.astype(str)+'\\_`+pdf.Z.astype(str)

The issue is that then I need to specify the values for Y and Z as ranges, but I'm not entirely sure how to go about this point. 问题在于,然后我需要将YZ的值指定为范围,但是我不确定如何解决这一点。 Any suggestions? 有什么建议么? Thanks a lot in advance! 在此先多谢!

IIUC, you can do the following: IIUC,您可以执行以下操作:

df = odf.reset_index().merge(pdf.reset_index(), on='X', suffixes=('odf','pdf'))

cleaned = df[(df['Ypdf'].between(df['Yodf'], df['Zodf'])) | (df['Zpdf'].between(df['Yodf'], df['Zodf']))]

Yields: 产量:

  indexodf   X  Yodf  Zodf indexpdf  Ypdf  Zpdf
1       b2  s1     5   300       d6    50   220
2       b4  s3   500   550       d7   503   509
3       b6  s5     5    25      d16    15    30
4       b6  s5     5    25      d18     4    15

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM