[英]Merge columns based on partial match with pandas
I have 2 dfs that I want to merge them by: 我有2个DFS,我想通过以下方式合并它们:
- Exact match on column X
. -在X
列上完全匹配。
- Numbers in Y
and Z
in pdf
should be within range of those in odf
, even if only partially. -数字在Y
和Z
在pdf
应在那些范围odf
,即使只是部分。
#odf
X Y Z
b1 s1 3 19
b2 s1 5 300
b4 s3 500 550
b6 s5 5 25
#pdf
X Y Z
d3 s2 7 12 #wrong s
d6 s1 50 220 #match b2 above
d7 s3 503 509 #match b4 above
d16 s5 15 30 #accept match to b6, partial match in Y/Z.
d18 s5 4 15 #accept match to b6
In this case, I would get: 在这种情况下,我会得到:
#iodf and ipdf are indices of the two dfs above
iodf X Yodf Zodf ipdf Ypdf Zpdf
b2 s1 5 300 d6 50 220
b4 s3 500 550 d7 503 509
b6 s5 5 25 d16 15 30
b6 s5 5 25 d18 4 15
I was thinking about creating an additional column with a regex in each df, and merging them based on that regex. 我正在考虑在每个df中创建一个带有正则表达式的附加列,并根据该正则表达式合并它们。
odf.loc[:,'id']=odf.X+'\\_`+odf.Y.astype(str)+'\\_`+odf.Z.astype(str)
pdf.loc[:,'id']=pdf.X+'\\_`+pdf.Y.astype(str)+'\\_`+pdf.Z.astype(str)
The issue is that then I need to specify the values for Y
and Z
as ranges, but I'm not entirely sure how to go about this point. 问题在于,然后我需要将Y
和Z
的值指定为范围,但是我不确定如何解决这一点。 Any suggestions? 有什么建议么? Thanks a lot in advance! 在此先多谢!
IIUC, you can do the following: IIUC,您可以执行以下操作:
df = odf.reset_index().merge(pdf.reset_index(), on='X', suffixes=('odf','pdf'))
cleaned = df[(df['Ypdf'].between(df['Yodf'], df['Zodf'])) | (df['Zpdf'].between(df['Yodf'], df['Zodf']))]
Yields: 产量:
indexodf X Yodf Zodf indexpdf Ypdf Zpdf
1 b2 s1 5 300 d6 50 220
2 b4 s3 500 550 d7 503 509
3 b6 s5 5 25 d16 15 30
4 b6 s5 5 25 d18 4 15
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.