简体   繁体   中英

Merge dataframes by closest coordinates

Imagine we have 2 dataframes with coordinates ['X','Y']:

df1 :

 X            Y          House №
2531        2016           175
2219        2196           11
2901        3426           201
6901        4431           46
7891        1126           89

df2 :

 X            Y      Delivery office №
2534        2019            O1
6911        4421            O2
2901        3426            O3
7894.5      1120            O4 

My idea is to merge them and get:

df3

 X            Y          House №    Delivery office №
2531        2016           175            01
2219        2196           11             NA
2901        3426           201            03
6901        4431           46             02
7891        1126           89             04

So we wants to realise 'fuzzy' merge by threshold (this param should be given by user). You can see that house number 11 didn't get any delivery office number because it located to much away from all of presented offices in df2.

So I need all rows from df2 'find' it's closest row from df1 and add it's 'Cost' value to it You can see that usual in-box pd.merge do not work there as well as custom packages that realize fuzzy logic relates to string values using levenshtein distance and so on

No silver bullet, but a way to do this is to turn the Y values in categories using pd.cut . Using this method, it will place the values in different bins. You need to tune the bins manually, for example set it at 20.

Load the data:

df1 = pd.DataFrame({'X':[2531, 2219, 2901, 6901, 7891], 'Y':[2016, 2196, 3426, 4431, 1126], 'House':['A', 'B', 'J', 'A', 'A']})

df2 = pd.DataFrame({'X':[2534, 6911, 2901, 7894.5], 'Y':[2019, 4421, 3426, 1120], 'Cost':[1200, 3100, 800, 600]})

Make new categories:

df1['Y2'] = pd.cut(df1['Y'], 20, labels=False)

df2['Y2'] = pd.cut(df2['Y'], 20, labels=False)

df3 = pd.merge(df1, df2, on=['Y2'], how='left')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM