I have 2 dfs that I want to merge them by:
- Exact match on column X
.
- Numbers in Y
and Z
in pdf
should be within range of those in odf
, even if only partially.
#odf
X Y Z
b1 s1 3 19
b2 s1 5 300
b4 s3 500 550
b6 s5 5 25
#pdf
X Y Z
d3 s2 7 12 #wrong s
d6 s1 50 220 #match b2 above
d7 s3 503 509 #match b4 above
d16 s5 15 30 #accept match to b6, partial match in Y/Z.
d18 s5 4 15 #accept match to b6
In this case, I would get:
#iodf and ipdf are indices of the two dfs above
iodf X Yodf Zodf ipdf Ypdf Zpdf
b2 s1 5 300 d6 50 220
b4 s3 500 550 d7 503 509
b6 s5 5 25 d16 15 30
b6 s5 5 25 d18 4 15
I was thinking about creating an additional column with a regex in each df, and merging them based on that regex.
odf.loc[:,'id']=odf.X+'\\_`+odf.Y.astype(str)+'\\_`+odf.Z.astype(str)
pdf.loc[:,'id']=pdf.X+'\\_`+pdf.Y.astype(str)+'\\_`+pdf.Z.astype(str)
The issue is that then I need to specify the values for Y
and Z
as ranges, but I'm not entirely sure how to go about this point. Any suggestions? Thanks a lot in advance!
IIUC, you can do the following:
df = odf.reset_index().merge(pdf.reset_index(), on='X', suffixes=('odf','pdf'))
cleaned = df[(df['Ypdf'].between(df['Yodf'], df['Zodf'])) | (df['Zpdf'].between(df['Yodf'], df['Zodf']))]
Yields:
indexodf X Yodf Zodf indexpdf Ypdf Zpdf
1 b2 s1 5 300 d6 50 220
2 b4 s3 500 550 d7 503 509
3 b6 s5 5 25 d16 15 30
4 b6 s5 5 25 d18 4 15
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.