简体   繁体   中英

iterate a value of a row of a dataframe with each value of a column in another

I m trying to loop each row of df1 with every row of df2 and create a new col in df1 and store the min(all values) in it.

lat_sc= shopping_centers['lat']
long_sc= shopping_centers['lng']
for i, j in zip(lat_sc,long_sc):
    for lat_real, long_real in zip(real_estate['lat'],real_estate['lng']):
        euclid_dist.append( lat_real - i)
        short_dist.append(min(euclid_dist))
        euclid_dist = []

Result: df1['shortest'] = min(df1['lat']- each lat of df2 )

df1['nearest sc'] = that corresponding sc_id

Edit to include sc_id in df1

This could get computationally intensive as df2 gets big but you can find the difference the df1 distance and all the df2 distances like this (it's possible to do this more efficiently)

def find_euclid_dist(row):
    dist_arr = np.sqrt((ref_lats - row["lat"])**2 + (ref_longs - row["lng"])**2)
    return np.min(dist_arr)

ref_lats = df2["lat"].values
ref_longs = df2["lng"].values
df1["shortest"] = df1.apply(find_euclid_dist, axis=1)

How abut using cdist from scipy ?

from scipy.spatial.distance import cdist

df1['shortest'] = cdist(df1[['lat','lng']], df2[['lat','lng']], metric='euclidean').min(1)

print(df1) returns:

         lat        lng          addr_street    shortest
0 -37.980523 -37.980523     37 Scarlet Drive  183.022436
1 -37.776161 -37.776161  999 Heidelberg Road  182.817951
2 -37.926238 -37.926238        47 New Street  182.968096
3 -37.800056 -37.800056  3/113 Normanby Road  182.841849

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM